ISLR hw5 - HackMD

--- tags: ISLR --- # ISLR hw5 Range: 「課本第三章習題：第7題、第9題、第10題、第12題、第四章習題：第2題」 ## Q7 pass ## Q9 ```r library(ISLR) data(Auto) ``` ### a ```r pairs(Auto) ``` ![](https://i.imgur.com/prydFFM.png) ### b ```r # exclude column name cor(Auto[, -9]) # mpg cylinders displacement horsepower # mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 # cylinders -0.7776175 1.0000000 0.9508233 0.8429834 # displacement -0.8051269 0.9508233 1.0000000 0.8972570 # horsepower -0.7784268 0.8429834 0.8972570 1.0000000 # weight -0.8322442 0.8975273 0.9329944 0.8645377 # acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 # year 0.5805410 -0.3456474 -0.3698552 -0.4163615 # origin 0.5652088 -0.5689316 -0.6145351 -0.4551715 # weight acceleration year origin # mpg -0.8322442 0.4233285 0.5805410 0.5652088 # cylinders 0.8975273 -0.5046834 -0.3456474 -0.5689316 # displacement 0.9329944 -0.5438005 -0.3698552 -0.6145351 # horsepower 0.8645377 -0.6891955 -0.4163615 -0.4551715 # weight 1.0000000 -0.4168392 -0.3091199 -0.5850054 # acceleration -0.4168392 1.0000000 0.2903161 0.2127458 # year -0.3091199 0.2903161 1.0000000 0.1815277 # origin -0.5850054 0.2127458 0.1815277 1.0000000 ``` ### c ```r mpg_lm <-lm(mpg ~ . - name, data = Auto) summary(mpg_lm) # Call: # lm(formula = mpg ~ . - name, data = Auto) # Residuals: # Min 1Q Median 3Q Max # -9.5903 -2.1565 -0.1169 1.8690 13.0604 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) -17.218435 4.644294 -3.707 0.00024 *** # cylinders -0.493376 0.323282 -1.526 0.12780 # displacement 0.019896 0.007515 2.647 0.00844 ** # horsepower -0.016951 0.013787 -1.230 0.21963 # weight -0.006474 0.000652 -9.929 < 2e-16 *** # acceleration 0.080576 0.098845 0.815 0.41548 # year 0.750773 0.050973 14.729 < 2e-16 *** # origin 1.426141 0.278136 5.127 4.67e-07 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # Residual standard error: 3.328 on 384 degrees of freedom # Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182 # F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16 ``` #### i Yes, p < 0.05 #### ii displacement, weight, year and origin have stronger confidence have relation in statistic #### iii This lm suggest 0.751 for year. ### d ![](https://i.imgur.com/zXdsnHB.png) There are some outliers through the residual plot. Leverge plot show a value have very high leverge 14. ### e From last week's cor plot displacement:cylinders and displacement:weight have quiet high correlation. mpg:weight have the highest negative in correlation. We check this 3 interaction in lm ```r cor_lm <-lm(mpg ~ displacement*cylinders + displacement*weight + mpg*weight, data = Auto) summary(cor_lm) # Call: # lm(formula = mpg ~ displacement * cylinders + displacement * # weight + mpg * weight, data = Auto) # Residuals: # Min 1Q Median 3Q Max # -3.8503 -0.6281 -0.0573 0.5452 4.7186 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 3.162e+01 6.493e-01 48.696 < 2e-16 *** # displacement -7.004e-02 4.342e-03 -16.129 < 2e-16 *** # cylinders 1.283e+00 1.996e-01 6.425 3.89e-10 *** # weight -1.280e-02 3.481e-04 -36.786 < 2e-16 *** # displacement:cylinders -2.823e-03 8.910e-04 -3.169 0.00165 ** # displacement:weight 2.463e-05 1.302e-06 18.921 < 2e-16 *** # mpg:weight 3.691e-04 5.060e-06 72.945 < 2e-16 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # Residual standard error: 1.067 on 385 degrees of freedom # Multiple R-squared: 0.9816, Adjusted R-squared: 0.9813 # F-statistic: 3422 on 6 and 385 DF, p-value: < 2.2e-16 ``` Interaction part, displacement:weight and mpg:weight have stronger confidence have relation with y, displacement:cylinders's confidence is smaller. ### f Apply four kinds of transformation into weight, show plot with mpg ![](https://i.imgur.com/HDbcp3J.png) log and sqrt have better shape in my opinion, they are more likely to linear relation. ## Q10 ```r library(ISLR) data(Carseats) ``` ### a ```r carseat_lm <- lm(Sales ~ Price + Urban + US, data=Carseats) summary(carseat_lm) # Call: # lm(formula = Sales ~ Price + Urban + US, data = Carseats) # Residuals: # Min 1Q Median 3Q Max # -6.9206 -1.6220 -0.0564 1.5786 7.0581 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 13.043469 0.651012 20.036 < 2e-16 *** # Price -0.054459 0.005242 -10.389 < 2e-16 *** # UrbanYes -0.021916 0.271650 -0.081 0.936 # USYes 1.200573 0.259042 4.635 4.86e-06 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # Residual standard error: 2.472 on 396 degrees of freedom # Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335 # F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16 ``` ### b Price, there are some relation because have low p value. Higher Price will cause lower Sales. UrbanYes, there are no relation because p value is so high. USYes, have relation. If the store located in US, Sales will be higher about 1.2 ### c $Sales = 13.043 - 0.054 * Price + 1.201 * USYes + b$ If the shop located in US, USYes will be 1, otherwise, will be 0. ### d We can reject null hypothesis in Price and USYes, cuz their p-value is low(< 0.05) ### e ```r= evidence_lm <- lm(Sales ~ Price + US, data=Carseats) summary(evidence_lm) # Call: # lm(formula = Sales ~ Price + US, data = Carseats) # Residuals: # Min 1Q Median 3Q Max # -6.9269 -1.6286 -0.0574 1.5766 7.0515 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 13.03079 0.63098 20.652 < 2e-16 *** # Price -0.05448 0.00523 -10.416 < 2e-16 *** # USYes 1.19964 0.25846 4.641 4.71e-06 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # Residual standard error: 2.469 on 397 degrees of freedom # Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354 # F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16 ``` ### f They are almost the same. f just be a little better than a ### g ```r confint(evidence_lm) # 2.5 % 97.5 % # (Intercept) 11.79032020 14.27126531 # Price -0.06475984 -0.04419543 # USYes 0.69151957 1.70776632 ``` ### h ```r par(mfrow =c(2, 2)) plot(evidence_lm) ``` ![](https://i.imgur.com/acs00iK.png) From leverge, there are some outliers. ## Q12 ### a $\beta_x = \sum_{i=1}^{n} x_i y_i / \sum_{i'=1}^{n} x_{i'} ^ 2$ $\beta_y = \sum_{i=1}^{n} x_i y_i / \sum_{i'=1}^{n} y_{i'} ^ 2$ $\beta_y = \beta_x$ $\sum_{i=1}^{n} x_i y_i / \sum_{i'=1}^{n} x_{i'} ^ 2 = \sum_{i=1}^{n} x_i y_i / \sum_{i'=1}^{n} y_{i'} ^ 2$ $\sum_{i'=1}^{n} x_{i'} ^ 2 = \sum_{i'=1}^{n} y_{i'} ^ 2$ When sum of X^2 equal to sum of Y^2 ### b ```r x <- rnorm(100) y <- x ^2 yx_lm <- lm(y ~x +0) xy_lm <- lm(x ~y +0) summary(yx_lm) # Call: # lm(formula = y ~ x + 0) # Residuals: # Min 1Q Median 3Q Max # -0.0053 0.1184 0.4420 1.3340 10.6699 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # x 0.1456 0.1828 0.797 0.428 # Residual standard error: 1.886 on 99 degrees of freedom # Multiple R-squared: 0.00637, Adjusted R-squared: -0.003666 # F-statistic: 0.6347 on 1 and 99 DF, p-value: 0.4275 summary(xy_lm) # Call: # lm(formula = x ~ y + 0) # Residuals: # Min 1Q Median 3Q Max # -2.81406 -0.57183 0.07921 0.74082 2.85206 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # y 0.04375 0.05491 0.797 0.428 # Residual standard error: 1.034 on 99 degrees of freedom # Multiple R-squared: 0.00637, Adjusted R-squared: -0.003666 # F-statistic: 0.6347 on 1 and 99 DF, p-value: 0.4275 ``` ### c ```r x <- rnorm(100) y <- -rnorm(100) # sum(X^2) should be very close sum(Y^2) sum(x^2) # [1] 86.55331 sum(y^2) # [1] 87.43572 yx_lm <- lm(y ~x +0) xy_lm <- lm(x ~y +0) summary(yx_lm) # Call: # lm(formula = y ~ x + 0) # Residuals: # Min 1Q Median 3Q Max # -2.6074 -0.5135 0.1752 0.7049 2.0976 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # x 0.0339 0.1010 0.336 0.738 # Residual standard error: 0.9392 on 99 degrees of freedom # Multiple R-squared: 0.001138, Adjusted R-squared: -0.008952 # F-statistic: 0.1128 on 1 and 99 DF, p-value: 0.7377 summary(xy_lm) # Call: # lm(formula = x ~ y + 0) # Residuals: # Min 1Q Median 3Q Max # -2.90954 -0.69906 -0.09423 0.51060 2.00552 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # y 0.03356 0.09994 0.336 0.738 # Residual standard error: 0.9345 on 99 degrees of freedom # Multiple R-squared: 0.001138, Adjusted R-squared: -0.008952 # F-statistic: 0.1128 on 1 and 99 DF, p-value: 0.7377 ``` Same. ## ch4 Q2 pass