BI377 Excercise#2 Oliver Wen === ## Table of Contents [TOC] --- ## **Problem #1a** Create histograms showing the distribution of flipper lengths for each species and sex in the palmerpenguins::penguins dataset. Include a few short sentences offering a biological interpretation of these data. ![](https://hackmd.io/_uploads/B1YaebtGi.png) --- ``` ci <- function (x) { x <- x[!is.na(x)] return(c(mean(x) + 2 * sd(x), mean(x) - 2 * sd(x) )) } ci(penguins$flipper_length_mm) ggplot(penguins, aes(x=flipper_length_mm)) + theme_bw() + facet_grid( sex ~ species ) + geom_histogram() desc.stats <- penguins %>% filter(!is.na(flipper_length_mm)) %>% group_by(species) %>% summarise( mean_flipper_length_mm = mean(flipper_length_mm), sd_flipper_length_mm = sd(flipper_length_mm), ci_flipper_length_mm = 2*sd(flipper_length_mm) ) penguins %>% filter(!is.na(sex)) %>% ggplot(aes(x=flipper_length_mm)) + theme_bw() + facet_grid( sex ~ species ) + geom_histogram() + geom_vline(data = desc.stats, aes(xintercept = mean_flipper_length_mm), color = "darkred") + geom_vline(data = desc.stats, aes(xintercept = mean_flipper_length_mm - ci_flipper_length_mm), color = "darkred", linetype = "dotted") + geom_vline(data = desc.stats, aes(xintercept = mean_flipper_length_mm + ci_flipper_length_mm), color = "darkred", linetype = "dotted") ``` From the graph above, the data obtained for each specie are relatively normal distributed. There are large differences in between species but less variation between male and female individuals within each specie. It also shows the Chinstrap specie have the least amount of data points. --- ## **Problem #1b** Test for differences in the flipper length among these species. Briefly explain why you chose to use the test you did. Interpret the results of the results of the test in biological terms and report them in your text as we described in class. ![](https://hackmd.io/_uploads/H1HG-btzs.png) --- ``` #Variance Test adelie.flipper <- penguins$flipper_length_mm[which(penguins$species=="Adelie")] adelie.flipper <- adelie.flipper[!is.na(adelie.flipper)] chin.flipper <- penguins$flipper_length_mm[which(penguins$species=="Chinstrap")] chin.flipper <- chin.flipper[!is.na(chin.flipper)] gentoo.flipper <- penguins$flipper_length_mm[which(penguins$species=="Gentoo")] gentoo.flipper <- gentoo.flipper[!is.na(gentoo.flipper)] var.test(chin.flipper, adelie.flipper) var.test(gentoo.flipper, adelie.flipper) var.test(gentoo.flipper, chin.flipper) # leveneTest library(car) leveneTest(flipper_length_mm ~ species, data = penguins) # Test for Normal distribution of the dat shapiro.test(adelie.flipper) shapiro.test(chin.flipper) shapiro.test(gentoo.flipper) shapiro.test(log(adelie.flipper)) shapiro.test(log(chin.flipper)) shapiro.test(log(gentoo.flipper)) #Anova test aov.species <- aov(flipper_length_mm ~ species, data = penguins) aov.species summary(aov.species) TukeyHSD(aov.species) ``` --- The dataset has independent variaibles with relatively similar sample sizes. Variance is relatively equal, and all 3 species appears to be roughly normal distributed, and most data points are within the 2ci zone. Therefore, ANOVA analysis of variance test is performed. The result shows there is a significant difference between the 3 species in terms of flipper length, with a p-value of 2e-16. ## **Challenge #1a** Write a function to find the coefficient of variation for a numeric vector. (That is, the standard deviation divided by the mean.) Use your function to find the coefficient of variation for flipper length in each species. (You do not need to distinguish males and females.) This can be done using the “tidy” tools, like summarize(). Output a table of the coefficients for each trait and species combination. ![](https://hackmd.io/_uploads/HydSWZYMo.png) ``` #cv <- sd(data) / mean(data) * 100 cv <- function (x) { x <- x[!is.na(x)] return(sd(x) / mean(x) * 100) } # Coifficient of variation for flipper length cvfa <- cv(adelie.flipper) cvfc <- cv(chin.flipper) cvfg <- cv(gentoo.flipper) #define bill_length_mm adelie.billl <- penguins$bill_length_mm[which(penguins$species=="Adelie")] adelie.billl <- adelie.billl[!is.na(adelie.billl)] chin.billl <- penguins$bill_length_mm[which(penguins$species=="Chinstrap")] chin.billl <- chin.billl[!is.na(chin.billl)] gentoo.billl <- penguins$bill_length_mm[which(penguins$species=="Gentoo")] gentoo.billl <- gentoo.billl[!is.na(gentoo.billl)] cvbla <- cv(adelie.billl) cvblc <- cv(chin.billl) cvblg <- cv(gentoo.billl) #define bill_depth_mm adelie.billd <- penguins$bill_depth_mm[which(penguins$species=="Adelie")] adelie.billd <- adelie.billd[!is.na(adelie.billd)] chin.billd <- penguins$bill_depth_mm[which(penguins$species=="Chinstrap")] chin.billd <- chin.billd[!is.na(chin.billd)] gentoo.billd <- penguins$bill_depth_mm[which(penguins$species=="Gentoo")] gentoo.billd <- gentoo.billd[!is.na(gentoo.billd)] cvbda <- cv(adelie.billd) cvbdc <- cv(chin.billd) cvbdg <- cv(gentoo.billd) #define body_mass_g adelie.mass <- penguins$body_mass_g[which(penguins$species=="Adelie")] adelie.mass <- adelie.mass[!is.na(adelie.mass)] chin.mass <- penguins$body_mass_g[which(penguins$species=="Chinstrap")] chin.mass <- chin.mass[!is.na(chin.mass)] gentoo.mass <- penguins$body_mass_g[which(penguins$species=="Gentoo")] gentoo.mass <- gentoo.mass[!is.na(gentoo.mass)] cvma <- cv(adelie.mass) cvmc <- cv(chin.mass) cvmg <- cv(gentoo.mass) tab <- matrix(c(3.44266,6.865969,6.63156,12.39146,3.642001,6.837998,6.163729, 10.29537,2.985895,6.487455,6.549274,9.931336 ), ncol=4, byrow=TRUE) colnames(tab) <- c('flipper_length_mm','bill_length_mm','bill_depth_mm','body_mass_g') rownames(tab) <- c('Adelie','Chinstrap','Gentoo') tab <- as.table(tab) tab ``` --- ## **Challenge #1b** Apply the function to all the numerical traits in penguins. Represent the coefficients from the table produced in Challenge 1a or 1b as a barplot. Use ggplot. Tip: use the pivot_longer() function. Write a short biological interpretation of these results. ![](https://hackmd.io/_uploads/ByRqb-YMi.png) ``` library(ggplot2) library(tidyr) library(tidyverse) c1b <- data.frame(species=c('Adelie', 'Chinstrap', 'Gentoo'), flipperlength=c(cvfa,cvfc,cvfg), billlength=c(cvbla,cvblc,cvblg), billdepth=c(cvbda,cvbdc,cvbdg), bodymass=c(cvma,cvmc,cvmg)) c1b c1b %>% pivot_longer(cols=c('flipperlength', 'billlength', 'billdepth', 'bodymass'), values_to='coefficent_values', names_to='numerical_traits') %>% ggplot(aes(x=species, y=coefficent_values, fill=numerical_traits)) + geom_bar(stat='identity', position=position_dodge()) ``` From the graph above, it's clear that body mass has the biggest coefficient of variance among all numerical variables; and flipper length has the smallest variance. --- ###### tags: `Assignment` `BI377`