ISLR hw9 - HackMD

--- tags: ISLR --- # ISLR hw9 **M094020055 陳耀融** 「課本第五章習題：第2題、第3題、第9題、第六張習題：第1題、第5題」 ## ch05 Q2 ### a $p = \frac{n - 1}{n}$ Each observation has the same probability to be sampled. ### b $p = \frac{n - 1}{n}$ Same as (a) ### c Each event is independented. p(E) = $\frac{n - 1}{n}$ After many samples, probabilit of j-th observation is not in the bootstrap sample is $(\frac{n - 1}{n})^n$ ### d $p = 1 - (\frac{5 - 1}{5})^{5}$ ### e $p = 1 - (\frac{100 - 1}{100})^{100}$ ### f $p = 1 - (\frac{10000 - 1}{10000})^{10000}$ ![](https://i.imgur.com/gsOVxWE.png) ### g ```python x = range(1, 100000 + 1) y = [1 - ((n - 1) / n) ** n for n in x] plt.figure(figsize=(10, 5)) plt.plot(x, y) plt.show() ``` ![](https://i.imgur.com/PSeYOBx.png) x scale to log(x) ```python log_x = [math.log(n) for n in x] ``` ![](https://i.imgur.com/46zkeEE.png) probabilty is 1 in j = 0, and quickly converges around to 0.633 ### h prob = 0.634 no matter how, it will converges to around 0.63 ## Q3 ### a All dataset is shuffled and is splited into k folds. k - 1 folds are used to training, the last one is used to testing. ### b #### i pros: less bias cons: Cost more computational resocures #### ii pros: less computational resources cons: more bias ## Q9 ### a ```r library(MASS) mean(Boston$medv) # [1] 22.53281 ``` ### b ```r sd(Boston$medv) / sqrt(nrow(Boston) - 1) # [1] 0.4092658 ``` ### c ```r boot_mu <- function(data, idx) mean(data$medv[idx]) boot(Boston, boot_mu, 1000) # ORDINARY NONPARAMETRIC BOOTSTRAP # Call: # boot(data = Boston, statistic = boot_mu, R = 1000) # Bootstrap Statistics : # original bias std. error # t1* 22.53281 0.004718182 0.415501 ``` quiet similar ### d ```r t.test(Boston$medv) # 95 percent confidence interval: # 21.72953 23.33608 boston_mean <- mean(Boston$medv) mu <- sd(Boston$medv) / sqrt(nrow(Boston) - 1) c(boston_mean - 2*mu, boston_mean + 2*mu) # [1] 21.71427 23.35134 ``` very close ### e ```r median(Boston$medv) # [1] 21.2 ``` ### f ```r boot_median <- function(data, idx) median(data$medv[idx]) boot(Boston, boot_median, 1000) ``` ### g ```r quantile(Boston$medv, 0.1) # 10% # 12.75 ``` ### h ```r boot_quan <- function(data, idx) quantile(data$medv[idx], 0.1) boot(Boston, boot_quan, 1000) # ORDINARY NONPARAMETRIC BOOTSTRAP # Call: # boot(data = Boston, statistic = boot_quan, R = 1000) # Bootstrap Statistics : # original bias std. error # t1* 12.75 0.0026 0.4988157 ``` very close ## ch06 Q1 ### a Best subset ### b All have the same probability ### c #### i True #### ii True #### iii True #### iv False #### v False ## Q5 pass