BI278 Lab#8 notes

# BI278 Lab#8 notes first I copied the dataset from the course directory by using `cp /courses/bi278/Course_Materials/lab_08/multivariate_data.txt ~` ## exercise 1.1 Ifirst ran `x<-c("dendectend", "ggplot2", "vegan", "reshape2", "ggtrprl")` and `lapply(x, require, character.only = TRUE)` to launch these libraries. Next, I read the data I copied earlier by running `cdata <- read.table("multivariate_data.txt", h=T)` then used `head(cdata)` to check the data. It gave me a table. Then, I cahnged the raw names by `rownames(cdata) <- cdata$sample`. Next, I plotted a scatterplot by running `ggplot(cdata, aes(x=gene1, y=gene2)) + geom_point() + geom_text(aes(label=sample), hjust=-.3, vjust=0)`. I ran a PCA on this data by `cdata.pca <- prcomp(cdata[,c(3:10)], scale=T)` `cdata.pca.res <- as.data.frame(cdata.pca$x)`. Then, I plotted a scatterplot by running `ggplot(cdata.pca.res, aes(x=PC1, y=PC2)) + geom_point() + geom_text(aes(label=rownames(cdata.pca.res)), hjust=1,vjust=- .5)` Then, I borught over taxon label by running `cdata.pca.res$taxonomy <- cdata$taxonomy` and `ggplot(cdata.pca.res, aes(x=PC1, y=PC2, color=taxonomy)) + geom_point() + geom_text(aes(label=rownames(cdata.pca.res)), hjust=1,vjust=-.5)` to create new scatterplot. Next, I added "path" as a variable by `path = c(rep("green",5), rep("blue",4))` `cdata.pca.res$path <- path` `ggplot(cdata.pca.res, aes(x=PC1, y=PC2, color=path)) + geom_point() + geom_text(aes(label=rownames(cdata.pca.res)), hjust=1,vjust=-.5)` This plotted a new scatterplot. Then, I tried ggrepel `ggplot(cdata.pca.res, aes(x=PC1, y=PC2, color=path)) + geom_point() + geom_label_repel(aes(label=rownames(cdata.pca.res)))` which gave me another plot. ## exercise 1.2 First, I turned the data into a NMDS file by running`cdata.nms <- metaMDS(cdata[,c(3:10)], distance="bray", k=3)` and `cdata.nms` . Then, I plotted a stressplot `stressplot(cdata.nms)`. Next, I ran a series of commands to create the plot `path = c(rep("green",5), rep("blue",4))` `plot(cdata.nms, type="n") ordihull(cdata.nms, groups=path, draw="polygon", col="grey90", label=F)` `orditorp(cdata.nms, display="species", col="red")` `orditorp(cdata.nms, display="sites", col=c(rep("green",5),rep("blue",4)), cex=1.25)` which gave us this: ![](https://i.imgur.com/JFGev65.png) ## exercise 1.3 First I made a (Bray-Curtis) distance matrix by `d <- dist(cdata[,3:10], method="bray")` then I ran `cdata.clust <- hclust(d, method="ward.D2")` for a hierarchical clustering analysis. Next, I plotted a dendrogram with `plot(cdata.clust)` Then I ran `cutree(clust.avg, k=2)` to see the classification results based on the K I chose. I converted the clustering results into a dendrogram: `dendro <- as.dendrogram(cdata.clust) plot(dendro, type = "rectangle", ylab = "Height")` Finally, I manupilated the dendrogram further by using `dendro %>% set("leaves_pch", c(19)) %>% set("leaves_cex", c(2)) %>% set("leaves_col", as.numeric(cutree(cdata.clust, k=2)[cdata.clust$order])+1) %>% plot(type = "rectangle")` ## exercise 2 First, i converted the data from wide to long and take a look at how it's been converted so that I know the column names. `cdata <- read.table("multivariate_data.txt", h=T)` `path <- c(rep("green",5), rep("blue",4))` `cdata$path <- path` `cluster.long <- melt(cluster[,1:11])` `head(cluster.long)` Next, I plotted these data into a series of box plots that shows me the difference in each gene count by taxonomy or “path”: `ggplot(cdata.long, aes(x=taxonomy, y=value, fill=taxonomy)) + geom_boxplot() + facet_wrap(.~variable, scales="free")` `ggplot(cdata.long, aes(x=path, y=value, fill=path)) + geom_boxplot() + facet_wrap(.~variable, scales="free")` Then, I subset the data by `gene2.long <- subset(cdata.long, variable=="gene2")` `ggplot(gene2.long, aes(x=taxonomy, y=value, fill=taxonomy)) + geom_boxplot()` Finally, I made a dot plot by running `ggplot(gene2.long, aes(x=path, y=value, fill=path)) + geom_dotplot(binaxis='y', stackdir='center')`