---
tags: ISLR
---
# ISLR hw3
Range: ch2 ex3, 7, ch3 1, 3, 8
1. ch02 Q3
a. pass
b.
`training loss`: it would reduce when model's flexity increase, bacause it grow its parameter and learn more complicate pattern from trainging data(aka overfitting).
`testing loss`: it would reduce first, because model start learning more and more correct pattern from less flexity to more flexity. But, when the model start being overfitting, it will learn so many incorrect pattern from training set. Testing loss will increase.
`bias`: when the model flexity increase, our model will do less asuumption than less flexity one. So, it will decrase very slow.
`variance`: variance would be incease, beacuse more parameter will lead more-variance to fit data better.
2. ch02 Q7
a. Euclidean Distance
from obs1 to obs6
3.0, 2.0, 3.16, 2.24, 1.41, 1.73
b. K = 1
the nearest neighbor is obs5
output is **Green**
c. K = 3
top 3 nearest neighbos is obs2, obs5 and obs6
output is **Red**
d. non-linear
smaller K is better.
Because higher K will cross so many boundary and capture more error points
3. ch03 Q1
a. intercept
Represent whether sales is 0 when 3 variables is 0
Result: p < 0.0001, sales **will not** be zero when 3 variables is 0
b. TV
c. radio
Represent whether increase sales is 0 when TV/radio increase
Result: p < 0.0001, sales **will not** be zero when TV/radio increase.
d. newspeper
Represent whether increase sales is 0 when newspaper increase
Result: p = 0.8599, sales **will** be zero when newspaper increase.
4. ch03 Q3
Y = 50 + 20 * GPA + 0.07 * IQ + 35 * Gender + 0.001 GPA * IQ + (-10) * GPA * Gender
a.
i.
Y = 35 * Gender - 10 * Gender * GPA
Y(Female) = 35 - 10 * GPA
Y(Male) = 0
Y(Male) - Y(Female) = 10 * GPA - 35
Y(Male) > Y(Female) when GPA > 3.5
ii.
continue above result
Y(Male) > Y(Female) when GPA < 3.5
iii. and iv.
continue above result
Y(Male) > Y(Female) when GPA < 3.5(high enough)
**iii. is correct**
b.
y = 50 + 20 * 4 + 0.07 * 110 + 35 + 0.01 * 4 * 110 - 10 * 4
= 50 + 80 + 7.7 + 35 + 4.4 - 40
= 137.1
c.
False, scale of IQ is bigger than other variables.
5. ch03 Q8
```r
require(ISLR)
data(Auto)
auto_lm <- lm(mpg ~ horsepower, data=Auto)
summary(auto_lm)
```
a.
i.
p < 0.001, have strong confidence there is relationship between horsepower and mpg
ii.
p close to 0, the confidence is strong
iii.
negative, the summary show -0.157845 estimate column
iv.
```r
new_data <- data.frame(horsepower = 98)
predict(auto_lm, new_data, interval = 'confidence')
predict(auto_lm, new_data, interval = 'prediction')
```
b.
```r
plot(Auto$horsepower, Auto$mpg)
abline(auto_lm, col='green')
```
c.
```r
par(mfrow=c(2,2))
plot(auto_lm)
```
seen Residual vs fitted plot, there should be a non-linear model is better