# Carrie Nguyen: Exercise 1
## Problem 1
Using the `ChickWeights` dataset, create boxplots showing log~10~ weights only on the first, tenth, and last days of the experiment.
Load in necessary libraries:
```library(ggplot2)```
First, we start by loading in the ChickWeights dataset in R. We also take a peek at what the data looks like.
```
data("ChickWeight")
head(ChickWeight)
```

We then rename the columns of our dataset for usability.
```
colnames(ChickWeight) <- c('weight', 'days', 'indiv', 'diet')
head(ChickWeight)
```

Now, we subset our data to include only the 1st, 10th, and final days of the experiment. We use the `which` function in R along with the Boolean *or* operator `|` to select the appropriate rows of data for our subset.
```
chickSubset <- ChickWeight[which(ChickWeight$days == 0 | ChickWeight$days == 10 | ChickWeight$days == 21),]
```

We also need to define the **days** column of our dataset as a factor variable. This helps us catagorize the data as "day groups" and allows us to represent each group in its own boxplot.
``` chickSubset$days <- as.factor(chickSubset$days)```
Finally, plot the boxplots showing the log~10~ weights.
```
plot <- ggplot(chickSubset, aes(x = days, y = log10(weight))) + geom_boxplot()
plot
```

**Challenge 1a**
Have the figure display results for each diet separately.
To display the results for each of the diets (1-4), we use a `facet_grid`. Each facet represents one of the diets and will contain the results for each of the 3 days (0,10,21).
```
plot + facet_grid(cols = vars(diet)) + ggtitle('Chick weights based on different diets')
```

**Challenge 1b**
With the figure features above, communicate both the individual values and the distribution of those values to a viewer.
We can display individual values and a distribution of values for each of the plots using a violin plot `geom_violin` in ggplot.
```
vPlot <- ggplot(chickSubset, aes(x = days, y = log10(weight))) +
geom_violin(draw_quantiles = 0.5, color = 'darkgray', alpha = 0.75) +
geom_jitter(color = 'red', width = 0.3) +
ggtitle('Chick Weights based on different diets') +
xlab('day') + ylab('log10 chick weight(g)') + facet_grid(cols = vars(diet))
vPlot
```

## Problem 2
Randomly generate two vectors of 100 values each. Plot them against one another and draw a linear trend line through the group of points.
First, we generate the two random vectors (`a` and `b`) using the `runif` function which generates random numbers from a uniform distribution. Then we place these vectors in a DataFrame object for us to use for plotting later.
```
a <- runif(n = 100, min = 0, max = 500)
b <- runif(n = 100, min = 0, max = 500)
randData <- data.frame(a,b)
```
Next, we plot `a` and `b` against each other on a scatterplot. We plot the trend line of the dataset using the `stat_smooth()` function. We specify that we want to use a linear smooth `lm` because we want a linear trend line.
```
ggplot(randData, aes(x=a, y = b)) + geom_point() +
stat_smooth(method=lm, se = FALSE, color = 'red')
```

**Challenge 2a**
Find the slope of the line you've drawn above.
To find the slope of our trend line, we have to find its coefficients. This is accomplished using the `coef()` function which returns all the coefficients for the model specified. Since we are looking at a linear model, we again specify this using the `lm` parameter.
```
coef <- coef(lm(randData$b~randData$a))
coef
```

Now that we have the the coefficients for our equation, we can write it all out using `paste`.
```
slope <- paste('y = ', coef[2], "x + ", coef[1])
slope
```

**Challenge 2b**
Add text to the plot, showing the slope.
Now that we have our equation, we can add it to our plot.
```
ggplot(randData, aes(x=a, y = b)) + geom_point() + stat_smooth(method=lm, se = FALSE, color = 'red') +
ggtitle(slope)
```
