owned this note
owned this note
Published
Linked with GitHub
# ENV710 20221019
---
title: "Corn and the Great Depression"
output: html_notebook
author: Ina Liao, Chia Shen Tsai
---
For this example, you need to install the package "Sleuth3" (This is from the Statistical Sleuth Textbook (one of my favorites, but no one else seems to love it.))
```{r}
library(Sleuth3)
library(ggplot2)
```
We will use the data from ex0195. The data are annual rainfall in inches, Yield is average corn production in bushels per acres and time is year.The corn production was measured in six Midwest states (Go Midwest!).
```{r}
str(ex0915)
```
We are interested in the relationship between rainfall and corn production. Let's make a scatterplot.
### Question 1: Discuss the relationship between rainfall and corn yield based on the scatterplot below.
The model suggest that there is a positive relationship between the rainfall and the cornyield. Every 1 inch of increment in rainfall result in 0.2 bushels per acres increase in the corn yield.
```{r}
model<-lm(Rainfall~Yield, data=ex0915)
summary(model)
```
```{r}
ggplot(ex0915, aes(x=Rainfall, y=Yield))+
geom_point()+
labs(y="Corn Yield (bushels corn per acre)", x="Rainfall (Inches per year)", title="Corn Yield across Six Midwestern States in the Late 19th and Early 20th Centuries")
```
## Model Corn.1
### Question 2. Discuss the residual vs fitted plot of model corn.1
```{r}
corn.1<-lm(Yield~Rainfall, data=ex0915)
summary(corn.1)
plot(corn.1,1)
```
In the residuals vs fitted plot, the data scatters around the horizontal line. But there are a few outliers on the left and right hand side, causing the line turn into the bell-shaped.
## Model Corn.2: Let's add in Year to the model.
### Question 3: Why might year be important to the model (conceptually)? Interpret the slope coefficient on Year.
year could include other ommitted factors: policy, technology change (the irrigation technology)
```{r}
corn.2<-lm(Yield~Rainfall + Year, data=ex0915)
summary(corn.2)
```
In some years, there might have sever drought that could affect the amount of water could be used for agriculture. On the other hand, if there were more hurricanes in the year, the available amount of water for agriculture might be higher than the other years.
Here is the Residual vs. Fitted Plot for Corn.2.
### Question 4: Discuss the residual vs fitted plot in terms of the linearity assumption.
```{r}
plot(corn.2,1)
```
The plots shows that the assumption of mean error has not been perfectly met in this model. The mean errors are dragged down on two ends of the line, because there is no positive error in polar fitted values and there are a few obvious outliers.
## Lets's try adding in a quadratic term for Rainfall. I'm adding the quadratic by adding I(Rainfall^2) to the equation.
### Question 5: What does the negative quadratic mean conceptually in this model?
It means that the influence of rainfall is depend on the year more rather on its own.
```{r}
corn.3<-lm(Yield~Rainfall + I(Rainfall^2) + Year, data=ex0915)
summary(corn.3)
```
Let's make a residual vs. fitted plot.
### Question 6: Discuss the residual vs. fitted plot in terms of the zero conditional mean [E(u|x1, x2, xj)=0] assumption
```{r}
plot(corn.3,1)
```
Under the assumption of E(u|x1, x2, xj)=0, we expect that the red line in the residuals vs fitted plot is parallel to the x-axis.
## What if we included an interaction of Rainfall and Year?
## Question 7: What might that the interaction mean conceptually?
```{r}
corn.4<-lm(Yield~Rainfall + I(Rainfall*Year) + Year, data=ex0915)
summary(corn.4)
```
```{r}
plot(corn.4,1)
```
the intersection of rainfall and yield is statistically significant (and even if it is not statistically signifiant, if it make conceptial sense, )
The effect of rainfall in one year on yield depends on the level of rainfall in the other year.
### Here is a final model with a quadratic and an interaction.
quadratic: rainfall^2
```{r}
corn.5<-lm(Yield~Rainfall +I(Rainfall^2) + I(Rainfall*Year) + Year, data=ex0915)
summary(corn.5)
```
```{r}
plot(corn.5, 1)
```
## Question 8: Which model explains the most variation in corn yield? Which model meets the assumption of the zero conditional mean of error term most closely?
1. does it conceptially make sense
2. residual vs fitted plot
3. homo
4. compare adjusted R^2 (select the one with higher adjusted R^2)
### Please knit and upload to Discussions.