---
tags: ISLR
---
# ISLR hw8
**M094020055 陳耀融**
Range: 「課本第四章習題:第8題、第13題、第五章習題:第1題、第8題、第六張習題:第2題」
## ch04 Q8
If knn perform steady on both of train and test split, we probably prefer use it for new oberservations.
Otherwise, knn perform diverse on two data splits, we prefer use Logistic Regression instead of knn
## Q13
```r
# load data and preprocessing
library(MASS)
summary(Boston)
Boston$crimRate <- 0
Boston$crimRate[Boston$crim > median(Boston$crim)] <- 1
table(Boston$crimRate)
# 0 1
# 253 253
Boston$crim <- NULL
```
```r
# split
n <- nrow(Boston)
idx <- sample(1:n, n * 0.5)
train_data <- Boston[idx,]
test_data <- Boston[-idx,]
```
```r
# build model
glm_model = glm(crimRate ~ . - crimRate, data = train_data, family = binomial)
summary(glm_model)
# Call:
# glm(formula = crimRate ~ . - crimRate, family = binomial, data = train_data)
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -1.84717 -0.21344 -0.00199 0.00422 3.14453
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -31.238480 8.902203 -3.509 0.00045 ***
# zn -0.078754 0.044329 -1.777 0.07564 .
# indus 0.038417 0.083462 0.460 0.64530
# chas 0.081782 1.042176 0.078 0.93745
# nox 42.286604 9.787384 4.321 1.56e-05 ***
# rm -0.114958 1.054339 -0.109 0.91318
# age 0.015487 0.016257 0.953 0.34078
# dis 0.644028 0.316409 2.035 0.04181 *
# rad 0.580803 0.223351 2.600 0.00931 **
# tax -0.010127 0.005592 -1.811 0.07015 .
# ptratio 0.297757 0.163614 1.820 0.06878 .
# black -0.007408 0.005843 -1.268 0.20491
# lstat 0.047060 0.060768 0.774 0.43868
# medv 0.113831 0.093266 1.220 0.22228
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# (Dispersion parameter for binomial family taken to be 1)
# Null deviance: 350.54 on 252 degrees of freedom
# Residual deviance: 104.50 on 239 degrees of freedom
# AIC: 132.5
# Number of Fisher Scoring iterations: 9
```
```r
glm_pred <- predict(glm_model, train_data, type = 'response')
glm_pred <- ifelse(glm_pred > 0.5, 1, 0)
table(glm_pred, train_data$crimRate)
# train error
1 - mean(glm_pred == train_data$crimRate)
# [1] 0.08300395
glm_pred <- predict(glm_model, test_data, type = 'response')
glm_pred <- ifelse(glm_pred > 0.5, 1, 0)
table(glm_pred, test_data$crimRate)
# test error
1 - mean(glm_pred == test_data$crimRate)
# [1] 0.1067194
```
looks not bad, just a little overfitting
test error rate 10.7%
```r
# lda
lda_model <- lda(crimRate ~ . - crimRate, data = train_data)
lda_pred <- predict(lda_model, test_data)
table(lda_pred$class, test_data$crimRate)
# error
1 - mean(lda_pred$class == test_data$crimRate)
# [1] 0.1699605
```
for lda, error rate is 17%
```r
# knn
k <- 1:15
knn_m <- function(k){
knn_pred <- knn(as.matrix(train_data), as.matrix(test_data),
train_data$crimRate, k=k)
return(c(k, 1 - mean(knn_pred == test_data$crimRate)))
}
sapply(k, knn_m)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.00000000 2.00000000 3.00000000 4.00000000 5.00000000
# [2,] 0.09090909 0.09486166 0.09486166 0.09486166 0.09881423
# [,6] [,7] [,8] [,9] [,10]
# [1,] 6.0000000 7.0000000 8.0000000 9.0000000 10.0000000
# [2,] 0.1185771 0.1422925 0.1383399 0.1383399 0.1422925
# [,11] [,12] [,13] [,14] [,15]
# [1,] 11.0000000 12.0000000 13.0000000 14.0000000 15.0000000
# [2,] 0.1501976 0.1422925 0.1501976 0.1383399 0.1462451
```
for knn, best k=1's error rate is 9.1%
## ch05 Q1
$$
Var(\alpha X + (1 - \alpha)Y) \\
= Cov(\alpha X, \alpha X) + Cov(\alpha X, (1 - \alpha)Y) + \\
Cov((1 - \alpha)Y), \alpha X) + Cov((1 - \alpha)Y), (1 - \alpha)Y)) \\
= Var(\alpha X) + 2 \alpha (1 - \alpha) Cov(X, Y) + Var((1 - \alpha)Y) \\
= \alpha^2 Var(X) + 2\alpha (1-\alpha) Cov(X, Y) + (1-\alpha)^2 Var(Y) \\
= \alpha^2 \sigma_X^2 + 2\alpha(1-\alpha)\sigma_{XY} + (1-\alpha)^2\sigma_Y^2 \\
= \alpha^2(\sigma_X^2+\sigma_Y^2-2\sigma_{XY})-\alpha(2\sigma_{XY}-2\sigma_Y^2)+\sigma_Y^2
$$
Minimum value of $\alpha$ is exists when $\alpha = \frac{-b}{2a} = \frac{\sigma_Y^2-\sigma_{XY}}{\sigma_X^2+\sigma_Y^2-2\sigma_{XY}}$
## Q8
### a
```r
x <- rnorm(100)
y <- x - 2 * x^2 + rnorm(100)
data <- data.frame(x = x, y = y)
head(data)
# x y
# 1 0.6867733 -0.23264900
# 2 -0.1604189 0.42413244
# 3 -0.9613197 -2.71510107
# 4 -1.2128508 -4.07083136
# 5 -0.1477261 -0.04665136
# 6 -0.3405169 0.03702584
```
n = 100, p = 1
$y = -2x^2 + x + \epsilon$
### b
```r
plot(x, y)
```

### c
```r
models <- list(
glm(y ~ poly(x, 1), data = data),
glm(y ~ poly(x, 2), data = data),
glm(y ~ poly(x, 3), data = data),
glm(y ~ poly(x, 4), data = data)
)
library(boot)
set.seed(1340)
mse <- lapply(models,
FUN = function(m) cv.glm(data, m)$delta)
mse
# [[1]]
# [1] 9.860545 9.855251
# [[2]]
# [1] 0.9724206 0.9722070
# [[3]]
# [1] 0.9810579 0.9807978
# [[4]]
# [1] 1.004417 1.003989
```
### d
```r
set.seed(1005)
mse <- lapply(models,
FUN = function(m) cv.glm(data, m)$delta)
mse
# [[1]]
# [1] 9.860545 9.855251
# [[2]]
# [1] 0.9724206 0.9722070
# [[3]]
# [1] 0.9810579 0.9807978
# [[4]]
# [1] 1.004417 1.003989
```
Yes, it is the same.
Because random seed will not cause different data set(model will browse every dataset as usual)
### e
`x^2` is the smallest error.
Becuase the original data generator is $y = -2x^2 + x + \epsilon$, it is quiet make sense.
### f
```r
lapply(models, summary)
# [[1]]
# Call:
# glm(formula = y ~ poly(x, 1), data = data)
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -14.7442 -0.5664 0.6819 1.7668 4.6297
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -1.9401 0.3003 -6.46 4.07e-09 ***
# poly(x, 1) 11.9534 3.0034 3.98 0.000132 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# (Dispersion parameter for gaussian family taken to be 9.020402)
# Null deviance: 1026.9 on 99 degrees of freedom
# Residual deviance: 884.0 on 98 degrees of freedom
# AIC: 507.72
# Number of Fisher Scoring iterations: 2
# [[2]]
# Call:
# glm(formula = y ~ poly(x, 2), data = data)
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -2.57632 -0.60346 -0.01275 0.54628 2.65671
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -1.94011 0.09794 -19.81 <2e-16 ***
# poly(x, 2)1 11.95338 0.97942 12.21 <2e-16 ***
# poly(x, 2)2 -28.12384 0.97942 -28.71 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# (Dispersion parameter for gaussian family taken to be 0.9592658)
# Null deviance: 1026.883 on 99 degrees of freedom
# Residual deviance: 93.049 on 97 degrees of freedom
# AIC: 284.58
# Number of Fisher Scoring iterations: 2
# [[3]]
# Call:
# glm(formula = y ~ poly(x, 3), data = data)
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -2.55848 -0.59502 -0.01382 0.54843 2.65147
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -1.94011 0.09843 -19.710 <2e-16 ***
# poly(x, 3)1 11.95338 0.98431 12.144 <2e-16 ***
# poly(x, 3)2 -28.12384 0.98431 -28.572 <2e-16 ***
# poly(x, 3)3 -0.19317 0.98431 -0.196 0.845
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# (Dispersion parameter for gaussian family taken to be 0.9688695)
# Null deviance: 1026.883 on 99 degrees of freedom
# Residual deviance: 93.011 on 96 degrees of freedom
# AIC: 286.54
# Number of Fisher Scoring iterations: 2
# [[4]]
# Call:
# glm(formula = y ~ poly(x, 4), data = data)
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -2.5605 -0.6119 -0.0190 0.5324 2.6617
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -1.94011 0.09894 -19.610 <2e-16 ***
# poly(x, 4)1 11.95338 0.98936 12.082 <2e-16 ***
# poly(x, 4)2 -28.12384 0.98936 -28.426 <2e-16 ***
# poly(x, 4)3 -0.19317 0.98936 -0.195 0.846
# poly(x, 4)4 -0.15244 0.98936 -0.154 0.878
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# (Dispersion parameter for gaussian family taken to be 0.9788235)
# Null deviance: 1026.883 on 99 degrees of freedom
# Residual deviance: 92.988 on 95 degrees of freedom
# AIC: 288.52
# Number of Fisher Scoring iterations: 2
```
Yes, model 1 and 2 is statistically siginificent.
Add `x^3` and `x^4` into models won't ruin the model's error.
## ch06 Q2
### a
Least squares:
iii. Lasso has penalty to restrict the flexibilty of the linear model.
So, variance will decreases and bias increas.
### b
ridge regression:
iii. Same as reason of a
### c
non-linear methods:
ii. non-linear model is higher flexible than linear models.