---
tags: ISLR
---
# ISLR hw10
**M094020055 陳耀融**
「課本 第六章習題:第4題、第8題、第9題;第七章習題:第1題 、第2題 」
## ch06 Q4
### a
(iii) the flexity will decease, so training RSS will increase
### b
(ii) decease flexity will increase generalization first, so test RSS will decrease
When flexity decrease too much, it will loss the discriminability of data test RSS will increase
### c
(iv) variance will decrease when flexity decrease
### d
(iii) A trade-off of c, bias will increase
### e
(v) it doesn't matter when flexity change
## Q6
### a
```r
set.seed(1340)
x <- rnorm(100)
epilson <- rnorm(100)
```
### b
```r
y <- 3 + 0.5 * x + 1.2 * x^2 + 9.9 * x^3 + epilson
```
### c
```r
new_x <- data.frame(sapply(1:10, FUN = function(n) x ^ n))
library(leaps)
subset <- regsubsets(new_x, y)
summary_subset <- summary(subset)
summary_subset
# Subset selection object
# 10 Variables (and intercept)
# Forced in Forced out
# X1 FALSE FALSE
# X2 FALSE FALSE
# X3 FALSE FALSE
# X4 FALSE FALSE
# X5 FALSE FALSE
# X6 FALSE FALSE
# X7 FALSE FALSE
# X8 FALSE FALSE
# X9 FALSE FALSE
# X10 FALSE FALSE
# 1 subsets of each size up to 8
# Selection Algorithm: exhaustive
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1 ( 1 ) " " " " "*" " " " " " " " " " " " " " "
# 2 ( 1 ) " " "*" "*" " " " " " " " " " " " " " "
# 3 ( 1 ) "*" "*" "*" " " " " " " " " " " " " " "
# 4 ( 1 ) " " "*" "*" " " "*" " " "*" " " " " " "
# 5 ( 1 ) " " "*" "*" " " "*" " " "*" " " "*" " "
# 6 ( 1 ) " " "*" "*" " " "*" " " "*" " " "*" "*"
# 7 ( 1 ) " " "*" "*" " " "*" " " "*" "*" "*" "*"
# 8 ( 1 ) " " "*" "*" "*" "*" "*" "*" "*" " " "*"
```
```r
plot(summary_subset[['adjr2']])
which.max(summary_subset[['adjr2']])
# 3
plot(summary_subset[['cp']])
which.min(summary_subset[['cp']])
# 3
plot(summary_subset[['bic']])
which.min(summary_subset[['bic']])
# 3
```
best model is 3
```r
coef(subset, 3)
# (Intercept) X1 X2 X3
# 3.0772642 0.4423009 1.1333846 9.8906446
```
### d
forward
```r
subset <- regsubsets(new_x, y, method = 'forward')
summary_subset <- summary(subset)
summary_subset
# Subset selection object
# 10 Variables (and intercept)
# Forced in Forced out
# X1 FALSE FALSE
# X2 FALSE FALSE
# X3 FALSE FALSE
# X4 FALSE FALSE
# X5 FALSE FALSE
# X6 FALSE FALSE
# X7 FALSE FALSE
# X8 FALSE FALSE
# X9 FALSE FALSE
# X10 FALSE FALSE
# 1 subsets of each size up to 8
# Selection Algorithm: forward
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1 ( 1 ) " " " " "*" " " " " " " " " " " " " " "
# 2 ( 1 ) " " "*" "*" " " " " " " " " " " " " " "
# 3 ( 1 ) "*" "*" "*" " " " " " " " " " " " " " "
# 4 ( 1 ) "*" "*" "*" " " "*" " " " " " " " " " "
# 5 ( 1 ) "*" "*" "*" " " "*" " " " " "*" " " " "
# 6 ( 1 ) "*" "*" "*" "*" "*" " " " " "*" " " " "
# 7 ( 1 ) "*" "*" "*" "*" "*" " " " " "*" " " "*"
# 8 ( 1 ) "*" "*" "*" "*" "*" "*" " " "*" " " "*"
```
```r
plot(summary_subset[['adjr2']])
which.max(summary_subset[['adjr2']])
# 3
plot(summary_subset[['cp']])
which.min(summary_subset[['cp']])
# 3
plot(summary_subset[['bic']])
which.min(summary_subset[['bic']])
# 3
```
backward
```r
subset <- regsubsets(new_x, y, method = 'backward')
summary_subset <- summary(subset)
summary_subset
# Subset selection object
# 10 Variables (and intercept)
# Forced in Forced out
# X1 FALSE FALSE
# X2 FALSE FALSE
# X3 FALSE FALSE
# X4 FALSE FALSE
# X5 FALSE FALSE
# X6 FALSE FALSE
# X7 FALSE FALSE
# X8 FALSE FALSE
# X9 FALSE FALSE
# X10 FALSE FALSE
# 1 subsets of each size up to 8
# Selection Algorithm: backward
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1 ( 1 ) " " " " "*" " " " " " " " " " " " " " "
# 2 ( 1 ) " " "*" "*" " " " " " " " " " " " " " "
# 3 ( 1 ) " " "*" "*" " " "*" " " " " " " " " " "
# 4 ( 1 ) " " "*" "*" " " "*" " " "*" " " " " " "
# 5 ( 1 ) " " "*" "*" " " "*" " " "*" " " "*" " "
# 6 ( 1 ) " " "*" "*" " " "*" " " "*" "*" "*" " "
# 7 ( 1 ) " " "*" "*" " " "*" "*" "*" "*" "*" " "
# 8 ( 1 ) " " "*" "*" "*" "*" "*" "*" "*" "*" " "
```
```r
plot(summary_subset[['adjr2']])
which.max(summary_subset[['adjr2']])
# 4
plot(summary_subset[['cp']])
which.min(summary_subset[['cp']])
# 4
plot(summary_subset[['bic']])
which.min(summary_subset[['bic']])
# 2
```
backward perform the worst.
forward and stepwise methods get the best model id = 3
### e
```r
library(glmnet)
lasso_lm <- cv.glmnet(as.matrix(new_x), as.matrix(y), alpha = 1)
lasso_lm[['lambda.min']]
# [1] 0.3407743
```
```r
plot(lasso_lm$lambda, lasso_lm$cvm)
```

```r
glm_model <- glmnet(
as.matrix(new_x), as.matrix(y),
alpha = 1, lambda = 0
)
coef(glm_model)
# 11 x 1 sparse Matrix of class "dgCMatrix"
# s0
# (Intercept) 3.0951876071
# X1 0.6500841398
# X2 1.0459451470
# X3 9.6054947224
# X4 0.0488754302
# X5 0.0926816198
# X6 -0.0007318723
# X7 -0.0056685459
# X8 -0.0010033607
# X9 -0.0007174021
# X10 -0.0001169911
```
got very similar results vs exhaustive subsets
### f
pass
## Q9
### a
```r
data('College')
train_idx <- sample(1:nrow(College), nrow(College) * 0.7)
test_idx <- setdiff(1:nrow(College), train_idx)
train_data <- College[train_idx, ]
test_data <- College[test_idx, ]
```
### b
```r
lm_model <- lm(Apps ~ ., data = train_data)
preds <- predict(lm_model, test_data, type = 'response')
loss_lm <- mean((preds - test_data$Apps) ^ 2)
loss_lm
# [1] 1234714
```
### c
```r
lambda_glm <- cv.glmnet(model.matrix(Apps ~ ., train_data),
as.matrix(train_data$Apps), alpha = 0,)
lambda_min <- lambda_glm[['lambda.min']]
# [1] 348.1503
```
```r
ridge_model <- glmnet( model.matrix(Apps ~ ., train_data),
as.matrix(train_data$Apps), alpha = 0,
lambda = lambda_min, thresh = 1e-12)
preds <- predict(ridge_model, model.matrix(Apps ~ ., test_data),
s = lambda_min)
loss_ridge <- mean((preds - test_data$Apps) ^ 2)
loss_ridge
# [1] 1280739
predict(ridge_model, type = 'coefficients',
s = lambda_min)
# 19 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.479267e+03
# (Intercept) .
# PrivateYes -7.010494e+02
# Accept 7.945396e-01
# Enroll 6.371419e-01
# Top10perc 2.661257e+01
# Top25perc -1.032271e+00
# F.Undergrad 1.133181e-01
# P.Undergrad -6.013832e-02
# Outstate -1.064216e-02
# Room.Board 2.304939e-01
# Books 2.154057e-02
# Personal -1.573161e-02
# PhD -3.868305e+00
# Terminal -4.378327e+00
# S.F.Ratio 1.250775e+01
# perc.alumni -1.584248e+01
# Expend 9.119914e-02
# Grad.Rate 1.397667e+01
```
### d
```r
lambda_lasso <- cv.glmnet(model.matrix(Apps ~ ., train_data),
as.matrix(train_data$Apps), alpha = 1)
lambda_min <- lambda_lasso[['lambda.min']]
lasso_model <- glmnet( model.matrix(Apps ~ ., train_data),
as.matrix(train_data$Apps), alpha = 1,
lambda = lambda_min, thresh = 1e-12)
preds <- predict(lasso_model, model.matrix(Apps ~ ., test_data),
s = lambda_min)
loss_lasso <- mean((preds - test_data$Apps) ^ 2)
loss_lasso
# [1] 1223643
predict(lasso_model, type = 'coefficients',
s = lambda_min)
# 19 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -838.72180000
# (Intercept) .
# PrivateYes -608.45369290
# Accept 1.23043705
# Enroll .
# Top10perc 41.94218353
# Top25perc -10.80256196
# F.Undergrad 0.04796578
# P.Undergrad -0.03533189
# Outstate -0.05456206
# Room.Board 0.17363867
# Books .
# Personal .
# PhD -7.17461991
# Terminal -1.40297222
# S.F.Ratio 11.24224597
# perc.alumni -8.86953835
# Expend 0.09115303
# Grad.Rate 11.18055676
```
### e
```r
library(pls)
pcr_model <- pcr(Apps ~ ., data = train_data, scale = TRUE,
validation = 'CV')
summary(pcr_model)
# Data: X dimension: 543 17
# Y dimension: 543 1
# Fit method: svdpc
# Number of components considered: 17
# VALIDATION: RMSEP
# Cross-validated using 10 random segments.
# (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
# CV 3729 3713 1858 1874 1452 1394 1387 1385 1380
# adjCV 3729 3713 1856 1877 1418 1386 1383 1383 1376
# 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps 16 comps 17 comps
# CV 1235 1236 1237 1244 1254 1252 1257 1109 1098
# adjCV 1232 1234 1235 1242 1250 1249 1255 1104 1093
# TRAINING: % variance explained
# 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps 9 comps 10 comps
# X 32.176 58.24 65.32 71.13 76.47 81.23 84.98 88.16 91.01 93.37
# Apps 2.058 75.98 76.10 86.80 87.14 87.18 87.23 87.50 89.56 89.63
# 11 comps 12 comps 13 comps 14 comps 15 comps 16 comps 17 comps
# X 95.43 97.10 98.18 98.96 99.47 99.87 100.00
# Apps 89.72 89.75 89.76 89.78 89.85 92.34 92.68
```
```r
validationplot(pcr_model, val.type = 'MSEP')
# 17 is the lowest
```

```r
preds <- predict(pcr_model, test_data, ncomp = 17)
loss_pcr <- mean((preds - test_data$Apps)^2)
loss_pcr
# [1] 1234714
```
### f
```r
plsr_model <- plsr(Apps ~ ., data = train_data, scale = TRUE,
validation = 'CV')
summary(plsr_model)
# Data: X dimension: 543 17
# Y dimension: 543 1
# Fit method: kernelpls
# Number of components considered: 17
# VALIDATION: RMSEP
# Cross-validated using 10 random segments.
# (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
# CV 3729 1644 1332 1219 1171 1134 1080 1080 1074
# adjCV 3729 1643 1332 1217 1167 1125 1075 1075 1071
# 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps 16 comps 17 comps
# CV 1068 1065 1065 1066 1066 1066 1067 1067 1067
# adjCV 1065 1062 1062 1063 1063 1063 1063 1064 1064
# TRAINING: % variance explained
# 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps 9 comps 10 comps
# X 26.16 40.15 63.74 66.82 70.42 74.37 78.19 81.57 84.23 86.38
# Apps 80.95 87.84 89.86 90.96 91.88 92.49 92.55 92.58 92.61 92.65
# 11 comps 12 comps 13 comps 14 comps 15 comps 16 comps 17 comps
# X 89.42 91.67 94.02 95.98 97.35 99.05 100.00
# Apps 92.67 92.68 92.68 92.68 92.68 92.68 92.68
```
```r
validationplot(plsr_model, val.type = 'MSEP')
```

```r
preds <- predict(plsr_model, test_data, ncomp = 10)
loss_plsr <- mean((preds - test_data$Apps)^2)
loss_plsr
# [1] 1239535
```
```r
losses = c(loss_lm, loss_ridge, loss_lasso, loss_pcr, loss_plsr)
names(losses) = c('lm', 'ridge', 'lasso', 'pcr', 'plsr')
losses
# lm ridge lasso pcr plsr
# 1234714 1280739 1223643 1234714 1239535
barplot(losses, main = 'test MSE', yaxs = 'i')
```

lasso performs little better
## ch07 Q1
### a
$$
f(x) = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4(x - \epsilon)^3_+ \\
f1(x) = a_1 + b_1 x + c_1 x^2 + d_1 x^3 \\
$$
f1(x) = f(x) when x <= \epsilon
so
$$
f1(x) = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3
$$
### b
$$
f2(x) = a_2 + b_2 x + c_2 x^2 + d_2 x^3 \\
$$
f2(x) = f(x) when x > \epsilon
$$
f2(x) = ... + \beta_3 x^3 + \beta_4 (x^3 - 3 x^2 \epsilon + 3 x \epsilon^2 - \epsilon^3) \\
= \beta_0 - \beta_4 \epsilon^3 + (\beta_1 + \beta_4 3 \epsilon^2) x + (\beta_2 - \beta_4 3 \epsilon) x^2 + (\beta_3 + \beta_4) x^3 \\
a_2 = \beta_0 - \beta_4 \epsilon^3 \\
b_2 = \beta_1 + \beta_4 3 \epsilon^2 \\
c_2 = \beta_2 - \beta_4 3 \epsilon \\
d_2 = \beta_3 + \beta_4
$$
### c
$$
f1(\epsilon) = f2(\epsilon) \\
f1(\epsilon) = \beta_0 + \beta_1 \epsilon + \beta_2 \epsilon^2 + \beta_3 \epsilon^3 \\
f2(\epsilon) \\
= \beta_0 - \beta_4 \epsilon^3 + \beta_1 \epsilon + \beta_4 3 \epsilon^3 + \beta_2 \epsilon^2 - \beta_4 3 \epsilon^3 + \beta_3 \epsilon^3 + \beta_4 \epsilon^3 \\
= \beta_0 + \beta_1 \epsilon + \beta_2 \epsilon^2 + \beta_3 \epsilon^3 \\
= f1(\epsilon)
$$
### d
$$
f1'(\epsilon) = \beta_1 + 2 \beta_2 \epsilon + 3 \beta_3 \epsilon^2 \\
f2'(\epsilon) = f1'(x) = \beta_1 + 2 \beta_2 \epsilon + 3 \beta_3 \epsilon^2
$$
### e
$$
f1''(\epsilon) = 2 \beta_2 + 6 \beta_3 \epsilon \\
f2''(\epsilon) = f1'(x) = 2 \beta_2 + 6 \beta_3 \epsilon
$$
## Q2


---