# Exercise 5 - Lexi Hammer ### Problem 1 Use the code from last week (October 18) to repeat the data curation and GPA on the `bi377.demo.borealisv.v.fervidus` forewing and hindwing datasets. If you already have the GPA objects for both datasets, then great! Show the names of data provenance entries on your forewing and hindwing data objects, after GPA. ![](https://i.imgur.com/wWJE0YT.png) ##### image of the GPA object for the forewings ![](https://i.imgur.com/szqLvLe.png) ##### image of the GPA object for the hindwings ``` xy.fw <- read.tps("bi377.demo.borealis.v.fervidus.forewings.class.tps", keep.original.ids = TRUE) xy.hw <- read.tps("bi377.demo.borealis.v.fervidus.hindwings.class.tps", keep.original.ids = TRUE) gpa.fw <- align.procrustes(xy.fw, outlier.analysis = TRUE) gpa.hw <- align.procrustes(xy.hw, outlier.analysis = TRUE) ``` #### Challenge 1 Save these data objects and their link matrices together in one file in your working directory. Then load them back into R's environment and make it verbose. Show your code. (Hint: Google how to do this "in R".) ``` fw.links <- matrix(c(1,2, 1,5, 5,4, 4,3, 3,2, 5,6, 6,7, 7,8, 8,9, 9,4, 3,11, 11,12, 11,10, 9,10, 10,14, 14,15, 15,16, 16,18, 18,20, 16,17, 17,8, 12,13, 13,19, 14,13, 18,19, 2,12), ncol = 2, byrow = TRUE) hw.links <- matrix(c(1:5,2:6), ncol = 2, byrow = FALSE) rbind(fw.links, hw.links) ``` - reference link: https://www.geeksforgeeks.org/combining-matrices-in-r/ - I combined the 2 matrices using the rbind function in r and not the cbind function because the fw.links matrix has the byrow = true and the hw.links matrix has the byrow = false, so the combination would have to be row bound and not column bound. ![](https://i.imgur.com/RyKOoFt.png) ##### images of the output for the rbind function for combining the forewing and hindwing matrices. ### Problem 2 Perform PCA on the two GPA-aligned wing datasets. Examine the distribution of variance associated with each principal component axis using the functions `pcvar` and `scree.plot`, for each dataset. ![](https://i.imgur.com/ZZAomdX.png) ##### image of the principal component analysis for the hindwings ``` pca.hw <- gm.prcomp(gpa.hw$gdf$coords) names(pca.hw) (pca.hw$sdev^2)/(sum(pca.hw$sdev^2)) #pcvar pcvar(pca.hw) #scree plot barplot((pca.hw$sdev^2)/(sum(pca.hw$sdev^2))) ``` ![](https://i.imgur.com/djXgOaP.png) ##### image of the principal component analysis for the forewings ``` pca.fw <- gm.prcomp(gpa.fw$gdf$coords) names(pca.fw) (pca.fw$sdev^2)/(sum(pca.fw$sdev^2)) #pcvar pcvar(pca.fw) #scree plot barplot((pca.fw$sdev^2)/(sum(pca.fw$sdev^2))) ``` #### Challenge 2 What does it mean if the first principal component axis has by far the most variance? What does it mean if variance is spread across multiple axes? How does this influence your interpretation of the forewing and hindwing data? - If the first pca axis has the most variance it means that the majority of the data varies highly, when it is spread across multiple axes it means that the variance is not that high. This influences my interpretation of the forewing and hindwing data because the hindwing data is spread out, while the first principal component for the forewings is very high, so the data for these wings was more varied. ### Problem 3 If you have not already done so, download the previously-blinded [metadata](https://github.com/aphanotus/openEd/blob/main/BI377.22F.morphometry/class6.Oct25/bi377.demo.borealis.v.fervidus.metadata.csv) for the shape datasets. Import the metadata into R. There's one snag. The table lists each specimen once, while all 8 of us digitized these specimens! So, you'll need to repeat e.g. the species information 8 times. Create a vector like this, `species <- rep(blinded.metadata$species,8)` Next, create shape space plots for each wing dataset and highlight species using convex hulls. Experiment with the variations of the plot we explored in class. For example, looking into higher dimensions than only the first two. Do you think we are correct in assigning these specimens to two different species? Do the forewing and hindwing data concur? ![](https://i.imgur.com/un2YqU4.png) ##### image of the shape space plot for forewings using convex hulls ![](https://i.imgur.com/5InMKbN.png) ##### image of the shape space plot for the hindwings using convex hulls ``` file.choose() blinded.metadata <- read.csv(file = 'bi377.demo.borealis.v.fervidus.metadata.csv') head(blinded.metadata) blinded.metadata speciesrep <- rep(blinded.metadata$species,8) shape.space(pca.fw, group = blinded.metadata$speciesrep, convex.hulls = TRUE, include.legend = TRUE, group.title = "species") shape.space(pca.hw, group = blinded.metadata$speciesrep, convex.hulls = TRUE, include.legend = TRUE, group.title = "species") ``` - based on the plot above, and experimenting with different variations of this plot, I believe we are correct in assigning these specimens to 2 different species because of the clear separation between the two clusters of data. The forewing data definitely concur, however the hindwing data demonstrates a high level of spread. #### Challenge 3 Examine whether digitizer (that is, the person who digitized each shape) distinguishes shapes in the dataset. Do this by creating another set of plots using convex hulls to highlight digitizer. ![](https://i.imgur.com/fnj8ZiT.png) - forewing shape space plot with axes specifications and convex hulls ``` shape.space(pca.fw, group = blinded.metadata$speciesrep, convex.hulls = TRUE, include.legend = TRUE, group.title = "species", axis1 = 1, axis2 = 3, ref.shape = gpa.fw$consensus) ``` - I tried to distinguish the digitizers by color using the function below, but I was unable to plot the results so that it differentiated who the digitizer was for each point. ``` shape.space(pca.fw, group = blinded.metadata$speciesrep, convex.hulls = TRUE, include.legend = TRUE, group.title = "species", ref.shape = gpa.fw$consensus, label.groups = TRUE, color = c("royalblue","red","blueviolet", "black", "brown","darkgreen", "darkgray", "deeppink", "orange")) ```