# Carrie Nguyen: Exercise 1 ## Problem 1 Using the `ChickWeights` dataset, create boxplots showing log~10~ weights only on the first, tenth, and last days of the experiment. Load in necessary libraries: ```library(ggplot2)``` First, we start by loading in the ChickWeights dataset in R. We also take a peek at what the data looks like. ``` data("ChickWeight") head(ChickWeight) ``` ![](https://i.imgur.com/ZmiR2Ak.png) We then rename the columns of our dataset for usability. ``` colnames(ChickWeight) <- c('weight', 'days', 'indiv', 'diet') head(ChickWeight) ``` ![](https://i.imgur.com/V27qrVj.png) Now, we subset our data to include only the 1st, 10th, and final days of the experiment. We use the `which` function in R along with the Boolean *or* operator `|` to select the appropriate rows of data for our subset. ``` chickSubset <- ChickWeight[which(ChickWeight$days == 0 | ChickWeight$days == 10 | ChickWeight$days == 21),] ``` ![](https://i.imgur.com/RsNV8KO.png) We also need to define the **days** column of our dataset as a factor variable. This helps us catagorize the data as "day groups" and allows us to represent each group in its own boxplot. ``` chickSubset$days <- as.factor(chickSubset$days)``` Finally, plot the boxplots showing the log~10~ weights. ``` plot <- ggplot(chickSubset, aes(x = days, y = log10(weight))) + geom_boxplot() plot ``` ![](https://i.imgur.com/BWsPxV5.png) **Challenge 1a** Have the figure display results for each diet separately. To display the results for each of the diets (1-4), we use a `facet_grid`. Each facet represents one of the diets and will contain the results for each of the 3 days (0,10,21). ``` plot + facet_grid(cols = vars(diet)) + ggtitle('Chick weights based on different diets') ``` ![](https://i.imgur.com/jko3p7E.png) **Challenge 1b** With the figure features above, communicate both the individual values and the distribution of those values to a viewer. We can display individual values and a distribution of values for each of the plots using a violin plot `geom_violin` in ggplot. ``` vPlot <- ggplot(chickSubset, aes(x = days, y = log10(weight))) + geom_violin(draw_quantiles = 0.5, color = 'darkgray', alpha = 0.75) + geom_jitter(color = 'red', width = 0.3) + ggtitle('Chick Weights based on different diets') + xlab('day') + ylab('log10 chick weight(g)') + facet_grid(cols = vars(diet)) vPlot ``` ![](https://i.imgur.com/FntVxhB.png) ## Problem 2 Randomly generate two vectors of 100 values each. Plot them against one another and draw a linear trend line through the group of points. First, we generate the two random vectors (`a` and `b`) using the `runif` function which generates random numbers from a uniform distribution. Then we place these vectors in a DataFrame object for us to use for plotting later. ``` a <- runif(n = 100, min = 0, max = 500) b <- runif(n = 100, min = 0, max = 500) randData <- data.frame(a,b) ``` Next, we plot `a` and `b` against each other on a scatterplot. We plot the trend line of the dataset using the `stat_smooth()` function. We specify that we want to use a linear smooth `lm` because we want a linear trend line. ``` ggplot(randData, aes(x=a, y = b)) + geom_point() + stat_smooth(method=lm, se = FALSE, color = 'red') ``` ![](https://i.imgur.com/98KPz2W.png) **Challenge 2a** Find the slope of the line you've drawn above. To find the slope of our trend line, we have to find its coefficients. This is accomplished using the `coef()` function which returns all the coefficients for the model specified. Since we are looking at a linear model, we again specify this using the `lm` parameter. ``` coef <- coef(lm(randData$b~randData$a)) coef ``` ![](https://i.imgur.com/RP1XHPc.png) Now that we have the the coefficients for our equation, we can write it all out using `paste`. ``` slope <- paste('y = ', coef[2], "x + ", coef[1]) slope ``` ![](https://i.imgur.com/fLdBivQ.png) **Challenge 2b** Add text to the plot, showing the slope. Now that we have our equation, we can add it to our plot. ``` ggplot(randData, aes(x=a, y = b)) + geom_point() + stat_smooth(method=lm, se = FALSE, color = 'red') + ggtitle(slope) ``` ![](https://i.imgur.com/XWoqTay.png)