Steven Fontanella
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    --- title: "Sesame Street Case Study" author: 'Matthew Oberholzer, Alex Chen, Steven Fontanella' date: 'September 22, 2019' output: pdf_document: toc: yes toc_depth: '2' --- ```{r, include=FALSE} my_plot_hook <- function(x, options) paste("\n", knitr::hook_plot_tex(x, options), "\n") knitr::knit_hooks$set(plot = my_plot_hook) ``` ```{r Front Matter, include=FALSE} # clean up & set default chunk options rm(list = ls()) knitr::opts_chunk$set(echo = FALSE) library(readxl) library(corrplot) sesame <- read_excel("sesame.xlsx") sesame$viewcat <- as.factor(sesame$viewcat) sesame$site <- as.factor(sesame$site) sesame[sesame$sex == 2, ]$sex <- 0 #male = 1, female = 0# sesame <- sesame[-c(5, 163),] #deleting the miscoded rows# ``` # Project Description *A client named Dr. Education has recently began working for the production company called Children's Television Workshop, and was tasked with presenting the findings of a study that he was not involved with. Prior to Dr. Education's hiring, the production company hired Educational Testing Services to collect data on the effectiveness of their show Sesame Street. Not much information was given regarding the data collection techniques, besides the fact that the subjects were children from the ages of 3 to 5, and were sampled from 5 different sites around the United States. This sampling technique suggests that the study was observational, meaning no causation can be inferred. Since the data has been collected, an analysis of the data can be done. An analysis of the efficiency of Sesame Street and particularly effective subject areas being taught by Sesame Street will be analyzed. The analysis will hopefully be able to generate an association between the amount of Sesame Street watched and the effectiveness of learning on children. Since the data collected was not randomly sampled from the entire population but rather from specific sites, it will be difficult to generalize it to the entire United States population. * ## Research Questions **Question 1:** Does Sesame Street improve children’s knowledge of letters, numbers, and body parts? **Question 2:** What, if any, area should the client focus on for improvement? E.g., is the client better at teaching letters than they are at numbers? ## Statistical Questions **Question 1:** Is the population slope of test improvement vs Sesame Street viewing non-zero and significant for each test? i.e. $$\beta_{letters test} \stackrel{?}{=} 0$$ $$\beta_{numb test} \stackrel{?}{=} 0$$ $$\beta_{body test} \stackrel{?}{=} 0$$ **Question 2:** Are the mean test improvements different between the three tests? $$\mu_{letters test} \stackrel{?}{=} \mu_{numb test} \stackrel{?}{=} \mu_{body test} $$ ## Variables ### Response variable Post-test - Pre-test (Improvement Score) : This is done in order to quantify the improvement over the 6 month period of watching Sesame Street for each of the three tests to be observed: letters, body parts, and numbers. Scales are different for each test, so scores should be standardized for comparison relating to the second research question. The unit of measure for this variable will be the number of points on the test ### Possible explanatory variables Site: Coded as 1, 2, 3, 4, or 5 based on the aforementioned sampling sites Viewcat: Viewing categories which are coded as 1 if the children rarely watched the show to 4 if the children watched the show an average of more than 5 times a week. Peabody: The mental age scores of the children from the administration of the Peabody Picture Vocabulary Test in order to measure vocabulary maturity # Exploratory Data Analysis (EDA) ```{r, include=FALSE} # normalize by max score # read in dat data = sesame data$prebody <- data$prebody / 32 data$postbody <- data$postbody / 32 data$prelet <- data$prelet / 58 data$postlet <- data$postlet / 58 data$prenumb <- data$prenumb / 54 data$postnumb <- data$postnumb / 54 ``` ```{r, echo=FALSE} # define ys data$ybody <- data$postbody - data$prebody data$ynumb <- data$postnumb - data$prenumb data$ylet <- data$postlet - data$prelet ys <- list("body"=data$ybody, "number"=data$ynumb, "letter"=data$ylet) ``` Before any analysis, we remove two observations with observed maximum scores greater than the reported maximum score of the test. No bonus points were given in these tests, so we assume that this was an error in data entry. We also normalize all test scores by their maximums, bringing them into the range of $[0, 1]$. We normalize the test scores in order to accurately compare the effectivness of sesame street for each focus area. For ease of analysis, we define: $$y_{body} = postbody - prebody$$ $$y_{num} = postnumb - prenumb$$ $$y_{let} = postlet - prelet$$ We start by examining a plot of the relationship between Sesame Street viewing frequency (`viewcat`) and each $y$ value. On first glance, we see little to no improvement across `viewcat` on the body test, some improvement in the number test, and larger improvement in the letter test. ```{r, echo=FALSE} x <- data$viewcat m <- mapply(function(name, y){ boxplot(y ~ x, xlab="Viewing Category", ylab=paste("Difference in", name, "score"), main=paste("Viewing category vs. change in", name, "score")) }, names(ys), ys) ``` We also examine $y$ values in isolation of `viewcat`. This time we see that the improvement on the body test is lower overall, even without consideration of Sesame Street viewing. Additionally, we see that at least 75% of students improved their scores in each test subject. ```{r, echo=FALSE} # numerical summaries mapply(function(name, y){ summary(y) }, names(ys), ys) ``` ```{r, echo=FALSE} m <- mapply(function(name, y){ hist(y, main=paste("Histogram of", name, "improvements"), xlab=paste(name, "improvement (percentage points)"), xlim=c(-.8, .8), ylim=c(0,60)) }, names(ys), ys) ``` Two variables were undocumented: `regular` and `encour`. We found that $$ regular = \begin{cases} 0, & viewcat = 1 \\ 1, & viewcat > 1 \end{cases} $$ and $$ encour = \begin{cases} 0, & viewenc = 2 \\ 1, & viewenc = 1 \end{cases} $$ # Statistical Analysis 3 multi-linear regression models: 1 for each of our focus variables (letters, numbers, body parts) ```{r, echo = FALSE} numbMod = lm(ynumb ~ site + age + setting + prenumb + viewcat, data = data) bodyMod = lm(ybody ~ site + sex + age + prebody + viewcat, data = data) letMod = lm(ylet ~ site + age + prelet + viewcat, data = data) ``` ## Assumptions of multi-linear regression models ### Linearity - a scatterplot of the residuals and y values which help determine whether the data itself can be linearly related If the scatterplot of the residuals and fitted y values shows the average of the residuals is close to 0, then we can meet the assumption of linearity. Refer to Figure 1. ### Independence - An assumption to check whether each sample/row in the data is taken separately from the other samples/rows The details of the sampling method of this data is not known, therefore we can confirm that data is independently sampled. However, because residuals are mostly random and unpredictable, we see little evidence that data are not independent. ### Normality - Checks to make sure that the sample data has been drawn from a normally distributed population The residuals when plotted, should exhibit a normal distribution. A Q-Q plot can be drawn to observe whether the assumption is met or not. Points should be close to the line on the Q-Q plot. Refer to Figure 2. ### Equivariance - Variance in a sample should be homogenuous so that distribution isn't skewed and within variance is homogenuous Scatter plot of residuals and y-values shouldn't have any pattern to it. Should be randomly scattered around to meet the equivariance assumption. Refer to Figure 3. ## Interpretation of results For interpretation purposes, we can multiply the coefficient of each variable by how much we normalized in the beginning. For body, we can multiply the coefficients by 32 for a more accurate interpretation and then similarly for letters and numbers. Multiply coefficients of letters by 58 and multiply coefficients of numbers by 54. ### Numbers ```{r, echo = FALSE} summary(numbMod) ``` From the p-value of `site2`, it appears a child from site 2 is significantly different from a child in site 1 by about 5 points for the difference between pre-test and post-test for numbers. The other significant explanatory variables are pre-test score for numbers and view category. For each point that a child scored in the pre-test for numbers lowered the difference between their post-test minus pre-test by 0.4 points. Lastly, it appears the more that a child watched sesame street, the higher their score improvement was. Comparing a child who watched with the least (view category 1) with children who watched with view category 2, 3, and 4 improved less by approximately 5, 8, and 9 fewer points, respectively. ### Body ```{r, echo = FALSE} summary(bodyMod) ``` The statistically significant predictors for the improvement in score for body parts is site, age, pre-test for body, and view category. For sites 2, 4, and 5, they all appear to be better than site 1 with having a higher score improvement by approximately 2. For age, since the p-value is less than $\alpha=0.05$, we can reject the null that the coefficient of age is 0 and conclude that age is significant predictor. An increase of one month of age increases the improvement score by 0.09. For the pretest for body, a point increase in how well a child scored in the pre-test for body is associated with a decrease of 0.62 in the improvement score. Lastly, view categories of 2, 3, and 4 are significantly different than view category 1. A child with a view category of 2 will have about 2.66 points more for improvement score than a child with view category of 1. Similarly, a child with view category of 3 and 4 will have approximately 4 and 4.5 more points, respectively in improvement score compared to a child in category of 1. ### Letters ```{r, echo = FALSE} summary(letMod) ``` Similar to the other areas, the significant predictors for the improvement score of letters are site, age, prelet, and view category. The significant predictors within site are site 2 and site 3. Site 2 is significantly different from site 1 with about a 7.714 increase in improvement score if it's a child from site 2 rather than site 1. A child in site 3, however, will have about a 5 point decrease in improvement score compared to a child in site 1. In addition, an increase in 1 month for a child will result in about 0.255 increase in improvement score. For every point that a child scores for the pre-test for letters, the child will score about 0.4 points lower for the improvement. Lastly, all levels of view category are significant. For a view category of 2 compared to view category 1, a child in view category 2 will score about 6 points higher for improvement. Then, for a child who has a view category of 3 or 4 will score about 12 points higher than a child who had a view category of 1. # Conclusion ## Research Questions revisited ### 1) Does Sesame Street improve children’s knowledge of letters, numbers, and body parts? All models have a significant viewing category slope at an alpha level of 0.01. We conclude that increased watching of sesame street is significantly associated with an improvement in children's knowledge of letters, numbers, and body parts without being able to account for potential confounding factors. These limitations are further discussed below. ### 2) What, if any, area should the client focus on for improvement? E.g., is the client better at teaching letters than they are at numbers? Based on the normalized improvement scores, the area the client should focus on is body parts. A child with a viewing category of 4 has approximately 0.14 increase in the normalized improvement score for body parts while a child with a viewing category of 4 has approximately 0.17 and 0.21 increase in normalized improvement score for numbers and letters, respectively. Thus, the client should look to focus their improvement on teaching body parts in Sesame Street. ## Recommendations **Question 1:** Our client should conduct further studies attempting to account for potential confounding factors such as income level before making a causal connection between Sesame Street and learning. **Question 2:** Our client should focus on body parts for improvement because it had the lowest improvement among the focus subject areas. ## Scope of inference and Study limitations We note that because the data was observed rather than derived from an experiment, a causal link cannot be established. For example, there may be differences between children who view Sesame Street regularly and those who do not, such as household income. Socioeconomic status was controlled for in this study, however because the sampling method of these groups is unknown, there may be bias in the groups. We recommend that more investigation is performed on the subject. # Technical Appendix ## Model Assumptions ### Linearity #### Figure 1 ```{r, include = TRUE} plot(numbMod, which = c(1), main = "Residuals vs. Fitted for Numbers") plot(bodyMod, which = c(1), main = "Residuals vs. Fitted for Body Parts") plot(letMod, which = c(1), main = "Residuals vs. Fitted for Letters") ``` Since all of the three plots have the average of the residuals as close to 0, we can say that the assumption of linearity is met. ### Normality #### Figure 2 ```{r, include = TRUE} plot(numbMod, which = c(2), main = "Q-Q plot for Numbers") plot(bodyMod, which = c(2), main = "Q-Q plot for Body Parts") plot(letMod, which = c(2), main = "Q-Q plot for Letters") ``` Since none of the plots show a line that strays too far out from the normal line, we meet the assumption of normality. ### Equivariance #### Figure 3 ```{r, include = TRUE} plot(numbMod, which = c(1), main = "Residuals vs. Fitted for Numbers") plot(bodyMod, which = c(1), main = "Residuals vs. Fitted for Body Parts") plot(letMod, which = c(1), main = "Residuals vs. Fitted for Letters") ``` Since all of the plots don't show any kind of pattern in the residuals vs fitted values plot, we can meet the assumption of equivariance. ### R Script ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(car) ``` ```{r, eval = FALSE} library(readxl) library(corrplot) sesame <- read_excel("sesame.xlsx") sesame$viewcat <- as.factor(sesame$viewcat) sesame$site <- as.factor(sesame$site) sesame[sesame$sex == 2, ]$sex <- 0 #male = 1, female = 0# sesame <- sesame[-c(5, 163),] #deleting the miscoded rows# ``` ## Creating a subset matrix with only the variables that we care about ```{r, eval = FALSE} new_sesame <- sesame[, !colnames(sesame) %in% c("agecat","viewenc","viewcat","preform","prerelat","preclasf","postform","postrelat","postclasf")] summary(new_sesame) ``` Correlation matrix of our subset matrix. ```{r, eval = FALSE} NewSES <- cor(new_sesame) corrplot(NewSES) ``` ## Histograms ```{r, eval = FALSE} hist(ynumb) #slight left skew# hist(ybody) #fairly normal# hist(ylet) #not normal, slight right skew# hist(sesame$peabody) #right skew with a very low outlier# ``` ## QQ-Plots ```{r, eval = FALSE} qqnorm(sesame$ynumb); qqline(sesame$ynumb) #fairly normal with some outliers in the left tail# qqnorm(sesame$ybody); qqline(sesame$ybody) #fairly normal with some outliers in both tails# qqnorm(sesame$ylet); qqline(sesame$ylet) #not really normal# ``` ## Preliminary Regression Models ### Model for ynumb ```{r, eval = FALSE} lm_ynumb <- lm(ynumb ~ sesame$site + sesame$sex + sesame$age + sesame$setting + sesame$prenumb + sesame$peabody + sesame$encour + sesame$viewcat) #summary(lm_ynumb)# lm_ynumb1 <- lm(ynumb ~ sesame$sex + sesame$age + sesame$setting + sesame$prenumb + sesame$peabody + sesame$encour + sesame$viewcat) #summary(lm_ynumb1)# lm_ynumb2 <- lm(ynumb ~ sesame$sex + sesame$age + sesame$setting + sesame$prenumb + sesame$peabody + sesame$viewcat) #summary(lm_ynumb2)# lm_ynumb3 <- lm(ynumb ~ sesame$age + sesame$setting + sesame$prenumb + sesame$peabody + sesame$viewcat) #summary(lm_ynumb3)# lm_ynumb4 <- lm(ynumb ~ sesame$setting + sesame$prenumb + sesame$peabody + sesame$viewcat) #all predictors here are significant at alpha = 0.1, setting is the only one not at an alpha of 0.05. It's p-value is 0.0666# #summary(lm_ynumb4)# lm_ynumb5 <- lm(ynumb ~ sesame$prenumb + sesame$peabody + sesame$viewcat) #all predictors are significant here at 0.05 level# summary(lm_ynumb5) #Model with interaction# lm_ynumb_int <- lm(ynumb ~ (sesame$site + sesame$sex + sesame$age + sesame$setting + sesame$prenumb + sesame$peabody + sesame$encour + sesame$viewcat)*sesame$sex) #summary(lm_ynumb_int)# lm_ynumb_int1 <- lm(ynumb ~ (sesame$sex + sesame$age + sesame$setting + sesame$prenumb + sesame$peabody + sesame$encour + sesame$viewcat)*sesame$sex + sesame$site) #summary(lm_ynumb_int1)# lm_ynumb_int2 <- lm(ynumb ~ (sesame$sex + sesame$age + sesame$setting + sesame$prenumb + sesame$peabody + sesame$viewcat)*sesame$sex + sesame$site + sesame$encour) #summary(lm_ynumb_int2)# lm_ynumb_int3 <- lm(ynumb ~ (sesame$sex + sesame$setting + sesame$prenumb + sesame$peabody + sesame$viewcat)*sesame$sex + sesame$site + sesame$encour + sesame$age) #summary(lm_ynumb_int3)# lm_ynumb_int4 <- lm(ynumb ~ (sesame$sex + sesame$setting + sesame$prenumb + sesame$viewcat)*sesame$sex + sesame$site + sesame$encour + sesame$age + sesame$peabody) #summary(lm_ynumb_int4)# lm_ynumb_int5 <- lm(ynumb ~ (sesame$sex + sesame$prenumb + sesame$viewcat)*sesame$sex + sesame$site + sesame$encour + sesame$age + sesame$peabody + sesame$setting) #summary(lm_ynumb_int5)# lm_ynumb_int6 <- lm(ynumb ~ (sesame$sex + sesame$prenumb)*sesame$sex + sesame$site + sesame$encour + sesame$age + sesame$peabody + sesame$setting + sesame$viewcat) #summary(lm_ynumb_int6)# lm_ynumb_int7 <- lm(ynumb ~ (sesame$sex + sesame$prenumb)*sesame$sex + sesame$site + sesame$age + sesame$peabody + sesame$setting + sesame$viewcat) #summary(lm_ynumb_int7)# lm_ynumb_int8 <- lm(ynumb ~ (sesame$sex + sesame$prenumb)*sesame$sex + sesame$age + sesame$peabody + sesame$setting + sesame$viewcat) #summary(lm_ynumb_int8)# lm_ynumb_int9 <- lm(ynumb ~ (sesame$sex + sesame$prenumb)*sesame$sex + sesame$peabody + sesame$setting + sesame$viewcat) #summary(lm_ynumb_int9)# lm_ynumb_int10 <- lm(ynumb ~ (sesame$sex + sesame$prenumb)*sesame$sex + sesame$peabody + sesame$viewcat) summary(lm_ynumb_int10)#every predictor is significant except for sex, but it's interaction is significant# #Comparing AIC AIC(lm_ynumb_int10, lm_ynumb5) #Using step function to select a model select_ynumb <- step(lm_ynumb) summary(select_ynumb) select_ynumbint <- step(lm_ynumb_int) summary(select_ynumbint) #comparing all AICs# AIC(select_ynumb, lm_ynumb_int10, select_ynumbint) #Looking at VIFs# vif(lm_ynumb_int10) vif(select_ynumb) vif(select_ynumbint) #Plotting selected model# plot(select_ynumb) ``` select_ynumbint has the smallest AIC, but some of its VIF values are greater than 4. This means that there are some multicollinearity issues with the model. Model select_ynumb has a similar AIC, with all VIF values being smaller than 2. So, we choose model select_ynumb. ### Model for ylet ```{r, eval = FALSE} lm_ylet <- lm(ylet ~ sesame$site + sesame$sex + sesame$age + sesame$setting + sesame$prelet + sesame$peabody + sesame$encour + sesame$viewcat) #summary(lm_ylet)# lm_ylet1 <- lm(ylet ~ sesame$site + sesame$sex + sesame$age + sesame$prelet + sesame$peabody + sesame$encour + sesame$regular) #summary(lm_ylet1)# lm_ylet2 <- lm(ylet ~ sesame$site + sesame$sex + sesame$age + sesame$prelet + sesame$peabody + sesame$regular) #summary(lm_ylet2)# lm_ylet3 <- lm(ylet ~ sesame$site + sesame$age + sesame$prelet + sesame$peabody + sesame$regular) #summary(lm_ylet3)# lm_ylet4 <- lm(ylet ~ sesame$site + sesame$prelet + sesame$peabody + sesame$regular)#all predictors are significant at a level of 0.1, except site) #summary(lm_ylet4)# lm_ylet5 <- lm(ylet ~ sesame$prelet + sesame$peabody + sesame$regular) #all predictors are significant at an alpha level of 0.05# summary(lm_ylet5) #Model with interaction# lm_ylet_int <- lm(ylet ~ (sesame$site + sesame$sex + sesame$age + sesame$setting + sesame$prelet + sesame$peabody + sesame$encour + sesame$regular)*sesame$sex) #summary(lm_ylet_int)# lm_ylet_int1 <- lm(ylet ~ (sesame$site + sesame$sex + sesame$age + sesame$setting + sesame$prelet + sesame$peabody + sesame$regular)*sesame$sex + sesame$encour) #summary(lm_ylet_int1)# lm_ylet_int2 <- lm(ylet ~ (sesame$site + sesame$sex + sesame$age + sesame$setting + sesame$prelet + sesame$peabody)*sesame$sex + sesame$encour + sesame$regular) #summary(lm_ylet_int2)# lm_ylet_int3 <- lm(ylet ~ (sesame$site + sesame$sex + sesame$age + sesame$setting + sesame$peabody)*sesame$sex + sesame$encour + sesame$regular + sesame$prelet) #summary(lm_ylet_int3)# lm_ylet_int4 <- lm(ylet ~ (sesame$site + sesame$sex + sesame$age + sesame$setting)*sesame$sex + sesame$encour + sesame$regular + sesame$prelet + sesame$peabody) #summary(lm_ylet_int4)# lm_ylet_int5 <- lm(ylet ~ (sesame$sex + sesame$age + sesame$setting)*sesame$sex + sesame$encour + sesame$regular + sesame$prelet + sesame$peabody + sesame$site) #summary(lm_ylet_int5)# lm_ylet_int6 <- lm(ylet ~ (sesame$sex + sesame$age + sesame$setting)*sesame$sex + sesame$regular + sesame$prelet + sesame$peabody + sesame$site) #summary(lm_ylet_int6)# lm_ylet_int7 <- lm(ylet ~ (sesame$sex + sesame$age)*sesame$sex + sesame$regular + sesame$prelet + sesame$peabody + sesame$site + sesame$setting) #summary(lm_ylet_int7)# lm_ylet_int8 <- lm(ylet ~ (sesame$sex + sesame$age)*sesame$sex + sesame$regular + sesame$prelet + sesame$peabody + sesame$site) #summary(lm_ylet_int8)# lm_ylet_int9 <- lm(ylet ~ sesame$sex + sesame$regular + sesame$prelet + sesame$peabody + sesame$site) #summary(lm_ylet_int9)# lm_ylet_int10 <- lm(ylet ~ sesame$regular + sesame$prelet + sesame$peabody + sesame$site) #summary(lm_ylet_int10)# lm_ylet_int11 <- lm(ylet ~ sesame$regular + sesame$prelet + sesame$peabody) summary(lm_ylet_int11) #checking for interaction turned out to be unnecessary since all interaction terms were insignificant.# #Using step() function to find best models# select_ylet <- step(lm_ylet) summary(select_ylet) select_yletint <- step(lm_ylet_int) #checking if step() function found any interactions to be significant, it did not# summary(select_yletint) #plotting best model# plot(select_ylet) #Comparing AICs# AIC(select_ylet, lm_ylet5) #Checking VIFs vif(select_ylet) ``` By comparing the AICs of the selected model throught the step() function and the one we found through backwards stepwise regression, we find our best model to be select_ylet. ### Model for ybody ```{r, eval = FALSE} lm_ybody <- lm(ybody ~ sesame$site + sesame$sex + sesame$age + sesame$setting + sesame$prebody + sesame$peabody + sesame$encour + sesame$viewcat) #summary(lm_ybody)# lm_ybody1 <- lm(ybody ~ sesame$site + sesame$sex + sesame$age + sesame$prebody + sesame$peabody + sesame$encour + sesame$viewcat) #summary(lm_ybody1)# lm_ybody2 <- lm(ybody ~ sesame$site + sesame$sex + sesame$age + sesame$prebody + sesame$peabody + sesame$viewcat) #summary(lm_ybody2)# lm_ybody3 <- lm(ybody ~ sesame$site + sesame$sex + sesame$age + sesame$prebody + sesame$viewcat) #summary(lm_ybody3)# lm_ybody4 <- lm(ybody ~ sesame$site + sesame$sex + sesame$prebody + sesame$viewcat) #summary(lm_ybody4)# lm_ybody5 <- lm(ybody ~ sesame$site + sesame$prebody + sesame$viewcat)#all predictors are significant at an alpha level of 0.05# summary(lm_ybody5) #Model with interaction# lm_ybody_int <- lm(ybody ~ (sesame$site + sesame$sex + sesame$age + sesame$setting + sesame$prebody + sesame$peabody + sesame$encour + sesame$viewcat)*sesame$sex) #summary(lm_ybody_int)# lm_ybody_int1 <- lm(ybody ~ (sesame$sex + sesame$age + sesame$setting + sesame$prebody + sesame$peabody + sesame$encour + sesame$viewcat)*sesame$sex + sesame$site) #summary(lm_ybody_int1)# lm_ybody_int2 <- lm(ybody ~ (sesame$sex + sesame$age + sesame$setting + sesame$prebody + sesame$encour + sesame$viewcat)*sesame$sex + sesame$site + sesame$peabody) #summary(lm_ybody_int2)# lm_ybody_int3 <- lm(ybody ~ (sesame$sex + sesame$age + sesame$setting + sesame$encour + sesame$viewcat)*sesame$sex + sesame$site + sesame$peabody + sesame$prebody) #summary(lm_ybody_int3)# lm_ybody_int4 <- lm(ybody ~ (sesame$sex + sesame$age + sesame$setting + sesame$encour)*sesame$sex + sesame$site + sesame$peabody + sesame$prebody + sesame$viewcat) #summary(lm_ybody_int4)# lm_ybody_int5 <- lm(ybody ~ (sesame$sex + sesame$setting + sesame$encour)*sesame$sex + sesame$site + sesame$peabody + sesame$prebody + sesame$viewcat + sesame$age) #summary(lm_ybody_int5)# lm_ybody_int6 <- lm(ybody ~ (sesame$sex + sesame$encour)*sesame$sex + sesame$site + sesame$peabody + sesame$prebody + sesame$viewcat + sesame$age + sesame$setting) #summary(lm_ybody_int6)# lm_ybody_int7 <- lm(ybody ~ (sesame$sex + sesame$encour)*sesame$sex + sesame$site + sesame$peabody + sesame$prebody + sesame$viewcat + sesame$age) #summary(lm_ybody_int7)# lm_ybody_int8 <- lm(ybody ~ sesame$sex + sesame$site + sesame$peabody + sesame$prebody + sesame$viewcat + sesame$age) #summary(lm_ybody_int8)# lm_ybody_int9 <- lm(ybody ~ sesame$sex + sesame$site + sesame$prebody + sesame$viewcat + sesame$age) #summary(lm_ybody_int9)# lm_ybody_int10 <- lm(ybody ~ sesame$sex + sesame$site + sesame$prebody + sesame$viewcat) #summary(lm_ybody_int10)# lm_ybody_int11 <- lm(ybody ~ sesame$site + sesame$prebody + sesame$viewcat) summary(lm_ybody_int11) #No interactions were significant, so we arrived at the same model without any interaction terms# #Using step() function to find model# select_ybody <- step(lm_ybody) #step() function found the same model I did# summary(select_ybody) select_ybodyint <- step(lm_ybody_int) #step() function did not find any significant interactions either# summary(select_ybodyint) #Checking AICs# AIC(select_ybody, lm_ybody5) #Checking VIFs# vif(select_ybody) #Plotting our selected model# plot(select_ybody) ``` The best model we found was select_ybody.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully