---
tags: ISLR
---
# ISLR hw5
Range: 「課本第三章習題:第7題、第9題、第10題、第12題、第四章習題:第2題」
## Q7
pass
## Q9
```r
library(ISLR)
data(Auto)
```
### a
```r
pairs(Auto)
```

### b
```r
# exclude column name
cor(Auto[, -9])
# mpg cylinders displacement horsepower
# mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268
# cylinders -0.7776175 1.0000000 0.9508233 0.8429834
# displacement -0.8051269 0.9508233 1.0000000 0.8972570
# horsepower -0.7784268 0.8429834 0.8972570 1.0000000
# weight -0.8322442 0.8975273 0.9329944 0.8645377
# acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955
# year 0.5805410 -0.3456474 -0.3698552 -0.4163615
# origin 0.5652088 -0.5689316 -0.6145351 -0.4551715
# weight acceleration year origin
# mpg -0.8322442 0.4233285 0.5805410 0.5652088
# cylinders 0.8975273 -0.5046834 -0.3456474 -0.5689316
# displacement 0.9329944 -0.5438005 -0.3698552 -0.6145351
# horsepower 0.8645377 -0.6891955 -0.4163615 -0.4551715
# weight 1.0000000 -0.4168392 -0.3091199 -0.5850054
# acceleration -0.4168392 1.0000000 0.2903161 0.2127458
# year -0.3091199 0.2903161 1.0000000 0.1815277
# origin -0.5850054 0.2127458 0.1815277 1.0000000
```
### c
```r
mpg_lm <-lm(mpg ~ . - name, data = Auto)
summary(mpg_lm)
# Call:
# lm(formula = mpg ~ . - name, data = Auto)
# Residuals:
# Min 1Q Median 3Q Max
# -9.5903 -2.1565 -0.1169 1.8690 13.0604
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -17.218435 4.644294 -3.707 0.00024 ***
# cylinders -0.493376 0.323282 -1.526 0.12780
# displacement 0.019896 0.007515 2.647 0.00844 **
# horsepower -0.016951 0.013787 -1.230 0.21963
# weight -0.006474 0.000652 -9.929 < 2e-16 ***
# acceleration 0.080576 0.098845 0.815 0.41548
# year 0.750773 0.050973 14.729 < 2e-16 ***
# origin 1.426141 0.278136 5.127 4.67e-07 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Residual standard error: 3.328 on 384 degrees of freedom
# Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182
# F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16
```
#### i
Yes, p < 0.05
#### ii
displacement, weight, year and origin have stronger confidence have relation in statistic
#### iii
This lm suggest 0.751 for year.
### d

There are some outliers through the residual plot.
Leverge plot show a value have very high leverge 14.
### e
From last week's cor plot
displacement:cylinders and displacement:weight have quiet high correlation.
mpg:weight have the highest negative in correlation.
We check this 3 interaction in lm
```r
cor_lm <-lm(mpg ~ displacement*cylinders + displacement*weight + mpg*weight, data = Auto)
summary(cor_lm)
# Call:
# lm(formula = mpg ~ displacement * cylinders + displacement *
# weight + mpg * weight, data = Auto)
# Residuals:
# Min 1Q Median 3Q Max
# -3.8503 -0.6281 -0.0573 0.5452 4.7186
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 3.162e+01 6.493e-01 48.696 < 2e-16 ***
# displacement -7.004e-02 4.342e-03 -16.129 < 2e-16 ***
# cylinders 1.283e+00 1.996e-01 6.425 3.89e-10 ***
# weight -1.280e-02 3.481e-04 -36.786 < 2e-16 ***
# displacement:cylinders -2.823e-03 8.910e-04 -3.169 0.00165 **
# displacement:weight 2.463e-05 1.302e-06 18.921 < 2e-16 ***
# mpg:weight 3.691e-04 5.060e-06 72.945 < 2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Residual standard error: 1.067 on 385 degrees of freedom
# Multiple R-squared: 0.9816, Adjusted R-squared: 0.9813
# F-statistic: 3422 on 6 and 385 DF, p-value: < 2.2e-16
```
Interaction part, displacement:weight and mpg:weight have stronger confidence have relation with y, displacement:cylinders's confidence is smaller.
### f
Apply four kinds of transformation into weight, show plot with mpg

log and sqrt have better shape in my opinion, they are more likely to linear relation.
## Q10
```r
library(ISLR)
data(Carseats)
```
### a
```r
carseat_lm <- lm(Sales ~ Price + Urban + US, data=Carseats)
summary(carseat_lm)
# Call:
# lm(formula = Sales ~ Price + Urban + US, data = Carseats)
# Residuals:
# Min 1Q Median 3Q Max
# -6.9206 -1.6220 -0.0564 1.5786 7.0581
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
# Price -0.054459 0.005242 -10.389 < 2e-16 ***
# UrbanYes -0.021916 0.271650 -0.081 0.936
# USYes 1.200573 0.259042 4.635 4.86e-06 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Residual standard error: 2.472 on 396 degrees of freedom
# Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
# F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
```
### b
Price, there are some relation because have low p value.
Higher Price will cause lower Sales.
UrbanYes, there are no relation because p value is so high.
USYes, have relation.
If the store located in US, Sales will be higher about 1.2
### c
$Sales = 13.043 - 0.054 * Price + 1.201 * USYes + b$
If the shop located in US, USYes will be 1, otherwise, will be 0.
### d
We can reject null hypothesis in Price and USYes, cuz their p-value is low(< 0.05)
### e
```r=
evidence_lm <- lm(Sales ~ Price + US, data=Carseats)
summary(evidence_lm)
# Call:
# lm(formula = Sales ~ Price + US, data = Carseats)
# Residuals:
# Min 1Q Median 3Q Max
# -6.9269 -1.6286 -0.0574 1.5766 7.0515
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
# Price -0.05448 0.00523 -10.416 < 2e-16 ***
# USYes 1.19964 0.25846 4.641 4.71e-06 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Residual standard error: 2.469 on 397 degrees of freedom
# Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
# F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
```
### f
They are almost the same.
f just be a little better than a
### g
```r
confint(evidence_lm)
# 2.5 % 97.5 %
# (Intercept) 11.79032020 14.27126531
# Price -0.06475984 -0.04419543
# USYes 0.69151957 1.70776632
```
### h
```r
par(mfrow =c(2, 2))
plot(evidence_lm)
```

From leverge, there are some outliers.
## Q12
### a
$\beta_x = \sum_{i=1}^{n} x_i y_i / \sum_{i'=1}^{n} x_{i'} ^ 2$
$\beta_y = \sum_{i=1}^{n} x_i y_i / \sum_{i'=1}^{n} y_{i'} ^ 2$
$\beta_y = \beta_x$
$\sum_{i=1}^{n} x_i y_i / \sum_{i'=1}^{n} x_{i'} ^ 2 = \sum_{i=1}^{n} x_i y_i / \sum_{i'=1}^{n} y_{i'} ^ 2$
$\sum_{i'=1}^{n} x_{i'} ^ 2 = \sum_{i'=1}^{n} y_{i'} ^ 2$
When sum of X^2 equal to sum of Y^2
### b
```r
x <- rnorm(100)
y <- x ^2
yx_lm <- lm(y ~x +0)
xy_lm <- lm(x ~y +0)
summary(yx_lm)
# Call:
# lm(formula = y ~ x + 0)
# Residuals:
# Min 1Q Median 3Q Max
# -0.0053 0.1184 0.4420 1.3340 10.6699
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# x 0.1456 0.1828 0.797 0.428
# Residual standard error: 1.886 on 99 degrees of freedom
# Multiple R-squared: 0.00637, Adjusted R-squared: -0.003666
# F-statistic: 0.6347 on 1 and 99 DF, p-value: 0.4275
summary(xy_lm)
# Call:
# lm(formula = x ~ y + 0)
# Residuals:
# Min 1Q Median 3Q Max
# -2.81406 -0.57183 0.07921 0.74082 2.85206
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# y 0.04375 0.05491 0.797 0.428
# Residual standard error: 1.034 on 99 degrees of freedom
# Multiple R-squared: 0.00637, Adjusted R-squared: -0.003666
# F-statistic: 0.6347 on 1 and 99 DF, p-value: 0.4275
```
### c
```r
x <- rnorm(100)
y <- -rnorm(100)
# sum(X^2) should be very close sum(Y^2)
sum(x^2)
# [1] 86.55331
sum(y^2)
# [1] 87.43572
yx_lm <- lm(y ~x +0)
xy_lm <- lm(x ~y +0)
summary(yx_lm)
# Call:
# lm(formula = y ~ x + 0)
# Residuals:
# Min 1Q Median 3Q Max
# -2.6074 -0.5135 0.1752 0.7049 2.0976
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# x 0.0339 0.1010 0.336 0.738
# Residual standard error: 0.9392 on 99 degrees of freedom
# Multiple R-squared: 0.001138, Adjusted R-squared: -0.008952
# F-statistic: 0.1128 on 1 and 99 DF, p-value: 0.7377
summary(xy_lm)
# Call:
# lm(formula = x ~ y + 0)
# Residuals:
# Min 1Q Median 3Q Max
# -2.90954 -0.69906 -0.09423 0.51060 2.00552
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# y 0.03356 0.09994 0.336 0.738
# Residual standard error: 0.9345 on 99 degrees of freedom
# Multiple R-squared: 0.001138, Adjusted R-squared: -0.008952
# F-statistic: 0.1128 on 1 and 99 DF, p-value: 0.7377
```
Same.
## ch4 Q2
pass