owned this note
owned this note
Published
Linked with GitHub
# in class assignment
```{r libraries}
library(ggplot2)
library(dplyr)
library(wbstats)
library(r2symbols)
rm(list=ls())
```
1. Check out the list of World Bank indicators here: https://data.worldbank.org/indicator?tab=all. To see information about an indicator (including ID), click on the indicator name and in the line graphic, click on the Details button on the upper right corner. You should see information about the indicator and the ID code.
2. Select one variable of interest as your response variable and one variable as your explanatory variable. It is best if both of the variables are continuous.
Response variable: Electric power consumption (kWh per capita)
Explanatory variable: GDP per capita ($US per capita)
3. Hypothesize a relationship (conceptually/theoretically between the two variables). Explain/Discuss.
GDP per capita has a positive conceptual relationship with Electric power consumption. As GDP per capita increases, electric power consumption increases as well.
Null hypothesis: `r symbol("beta")` 1 = 0
Alternative hypothesis: `r symbol("beta")1` not equal 0
4. Download the data either through wbstats package or through downloading the .csvs. You only need to download one year of data--ideally the same year for both variables.
```{r data}
wb_data <-
wb_data(
indicator = c("NY.GDP.PCAP.CD", "EG.USE.ELEC.KH.PC"),
country = "countries_only",
start_date = 2014,
end_date = 2014
)%>%
glimpse()
```
## Rename
```{r rename}
wb_data<- wb_data%>%
rename(gdp_pc=NY.GDP.PCAP.CD,
ec_pc=EG.USE.ELEC.KH.PC)%>%
glimpse()
```
## count missing observations
```{r missing}
sum(is.na(wb_data$ec_pc))
sum(is.na(wb_data$gdp_pc))
```
5. Develop a scatter plot of the data (original units) and discuss. Should either of the variables be transformed? Why or why not?
We should transform both data, because both of them are significantly positively skewed, rendering data concentreated at the value near 0.
## correlation
```{r linear.scatterplot, message=FALSE, warning=FALSE}
ggplot(data = wb_data, aes(x = gdp_pc, y = ec_pc))+
geom_point()+
stat_smooth(method=lm)+
labs(x="GDP per capita ($/person)", y="Electric Power Consumption per capita (kWh/person)")+
theme_classic()
```
6. Transform one or both of the variables if necessary. Develop a scatterplot of the transformed data.
```{r mutate.log.gdp.ec}
wb_data<-wb_data%>%
mutate(log.gdp_pc=log(gdp_pc),
log.ec_pc=log(ec_pc))%>%
glimpse()
```
```{r log.log.scatterplot, message=FALSE, warning=FALSE}
ggplot(data = wb_data, aes(x = log.gdp_pc, y = log.ec_pc))+
geom_point()+
stat_smooth(method=lm)+
labs(x="Log GDP per capita", y="Log Electric Power Consumption per capita")+
theme_classic()
```
7. Develop a linear model to examine the relationship in #3.
```{r log.log.regression.model}
lm1<-lm(log.ec_pc~log.gdp_pc, data = wb_data)
lm1
summary(lm1)
```
7a. Write an equation for the model.
log.ec_pc = -0.934 + 0.9511*log.gdp_pc
7b. Interpret the slope coefficient and its significance.
A 1% change in GDP per capita is associated with 0.95% increment in electric power consumption
7c. Interpret the R2 value.
The explanatory variable of GDP per capita explains 78.19% of the variation in the response variable of electric power consumption.
7d. Develop the residual vs. fitted plot function: plot(model_name, 1) and discuss the assumption of E(u|x)=0 based on the residual vs fitted plot.
8. Knit to html and upload to Discussions. Everyone should upload their html. Please include your name and your partners name in the YAML header. I hope you had fun playing with simple linear regression!
9. Come back to class at 1pm.