# BI278 Lab#8 notes
first I copied the dataset from the course directory by using `cp /courses/bi278/Course_Materials/lab_08/multivariate_data.txt ~`
## exercise 1.1
Ifirst ran `x<-c("dendectend", "ggplot2", "vegan", "reshape2", "ggtrprl")` and
`lapply(x, require, character.only = TRUE)` to launch these libraries.
Next, I read the data I copied earlier by running `cdata <- read.table("multivariate_data.txt", h=T)` then used `head(cdata)` to check the data. It gave me a table.
Then, I cahnged the raw names by `rownames(cdata) <- cdata$sample`.
Next, I plotted a scatterplot by running `ggplot(cdata, aes(x=gene1, y=gene2)) + geom_point() + geom_text(aes(label=sample), hjust=-.3, vjust=0)`.
I ran a PCA on this data by `cdata.pca <- prcomp(cdata[,c(3:10)], scale=T)`
`cdata.pca.res <- as.data.frame(cdata.pca$x)`. Then, I plotted a scatterplot by running `ggplot(cdata.pca.res, aes(x=PC1, y=PC2)) + geom_point() +
geom_text(aes(label=rownames(cdata.pca.res)), hjust=1,vjust=-
.5)`
Then, I borught over taxon label by running `cdata.pca.res$taxonomy <- cdata$taxonomy` and `ggplot(cdata.pca.res, aes(x=PC1, y=PC2, color=taxonomy)) +
geom_point() + geom_text(aes(label=rownames(cdata.pca.res)),
hjust=1,vjust=-.5)` to create new scatterplot.
Next, I added "path" as a variable by `path = c(rep("green",5), rep("blue",4))`
`cdata.pca.res$path <- path`
`ggplot(cdata.pca.res, aes(x=PC1, y=PC2, color=path)) +
geom_point() + geom_text(aes(label=rownames(cdata.pca.res)),
hjust=1,vjust=-.5)`
This plotted a new scatterplot.
Then, I tried ggrepel `ggplot(cdata.pca.res, aes(x=PC1, y=PC2, color=path)) +
geom_point() +
geom_label_repel(aes(label=rownames(cdata.pca.res)))` which gave me another plot.
## exercise 1.2
First, I turned the data into a NMDS file by running`cdata.nms <- metaMDS(cdata[,c(3:10)], distance="bray", k=3)` and `cdata.nms` .
Then, I plotted a stressplot `stressplot(cdata.nms)`.
Next, I ran a series of commands to create the plot `path = c(rep("green",5), rep("blue",4))`
`plot(cdata.nms, type="n")
ordihull(cdata.nms, groups=path, draw="polygon", col="grey90",
label=F)`
`orditorp(cdata.nms, display="species", col="red")`
`orditorp(cdata.nms, display="sites",
col=c(rep("green",5),rep("blue",4)), cex=1.25)` which gave us this:

## exercise 1.3
First I made a (Bray-Curtis) distance matrix by `d <- dist(cdata[,3:10], method="bray")`
then I ran `cdata.clust <- hclust(d, method="ward.D2")` for a hierarchical clustering analysis.
Next, I plotted a dendrogram with `plot(cdata.clust)`
Then I ran `cutree(clust.avg, k=2)` to see the classification results based on the K I chose.
I converted the clustering results into a dendrogram:
`dendro <- as.dendrogram(cdata.clust)
plot(dendro, type = "rectangle", ylab = "Height")`
Finally, I manupilated the dendrogram further by using `dendro %>%
set("leaves_pch", c(19)) %>%
set("leaves_cex", c(2)) %>%
set("leaves_col", as.numeric(cutree(cdata.clust,
k=2)[cdata.clust$order])+1) %>%
plot(type = "rectangle")`
## exercise 2
First, i converted the data from wide to long and take a look at how it's been converted so
that I know the column names.
`cdata <- read.table("multivariate_data.txt", h=T)`
`path <- c(rep("green",5), rep("blue",4))`
`cdata$path <- path`
`cluster.long <- melt(cluster[,1:11])`
`head(cluster.long)`
Next, I plotted these data into a series of box plots that shows me the difference in each gene count by taxonomy or “path”:
`ggplot(cdata.long, aes(x=taxonomy, y=value, fill=taxonomy)) +
geom_boxplot() + facet_wrap(.~variable, scales="free")`
`ggplot(cdata.long, aes(x=path, y=value, fill=path)) +
geom_boxplot() + facet_wrap(.~variable, scales="free")`
Then, I subset the data by `gene2.long <- subset(cdata.long, variable=="gene2")`
`ggplot(gene2.long, aes(x=taxonomy, y=value, fill=taxonomy)) +
geom_boxplot()`
Finally, I made a dot plot by running `ggplot(gene2.long, aes(x=path, y=value, fill=path)) +
geom_dotplot(binaxis='y', stackdir='center')`