機器學習 Assignment 10

# 1 ## List all parameters that should be set before running Adaboost adaboost 所需要的參數有 train,train_label,cycles ## Explain the meanings of those parameters. * train: 本輪訓練中會用到的訓練資料集，對應至上課投影片中的 $D$ * train_label: 本輪訓練中會用到的訓練資料集的 label (答案)，對應至上課投影片中的 $D$ * cycles: 總共有幾個 iteration，對應至上課投影片中的 $for \ t=1, 2, ..., T$ ![image](https://hackmd.io/_uploads/rJ3CXMBIA.png =70%x) # 2 ## How each weak learner is decided and trained in each iteration? 決定 weak learner 的方法於以下 weakLearner.m 的程式註解中說明，對應至上課投影片中，$A()$ 的部分 ![image](https://hackmd.io/_uploads/rJ3CXMBIA.png =70%x) ![image](https://hackmd.io/_uploads/rymguGr8R.png) 訓練 weak learner 並更新權重的方法於以下 adaBoost.m 的程式註解中說明，主要對應至上課投影片中的 ②③ ![image](https://hackmd.io/_uploads/rJ3CXMBIA.png =70%x) ![image](https://hackmd.io/_uploads/ryRRoVBIR.png) ## What is the learning algorithm A? 本程式採用 Decision Stump 作為 learning algorithm A ![image](https://hackmd.io/_uploads/ryJiTEBUR.png =70%x) ## Does it use bootstrapped dataset? 本程式中未使用 bootstrapped dataset ## If not, how D_t is obtained for each iteration? 在 adaBoost.m 中的第 21 行，使用了以下公式獲得 D_t $D_{t+1}(i) = D_t(i) \cdot \exp(ln(\frac{𝜀_{t}}{1-𝜀_{t}})) \cdot I(y_i≠h_i(x_i))$ ($I()$ 為指示函數) ![image](https://hackmd.io/_uploads/SkS7xmHLR.png) # 3 ## List the first three weak learners when the learning iteration stops 我在 adaboost.m 中，加入了以下程式碼，用來列出前三個 weak learners 包含的參數，有了這些參數，便可以決定一個 weak learner ![image](https://hackmd.io/_uploads/B1QLrVrLC.png) 以下是程式的輸出結果 ``` run adaboost with cycles=100 100 Cycle 1: i = 170, θ = 160.000000, s = 1 Cycle 2: i = 26, θ = 192.000000, s = 1 Cycle 3: i = 74, θ = 144.000000, s = 1 200 Cycle 1: i = 170, θ = 128.000000, s = 1 Cycle 2: i = 10, θ = 64.000000, s = 1 Cycle 3: i = 29, θ = 192.000000, s = 1 300 Cycle 1: i = 11, θ = 16.000000, s = 1 Cycle 2: i = 170, θ = 128.000000, s = 1 Cycle 3: i = 26, θ = 160.000000, s = 1 400 Cycle 1: i = 11, θ = 16.000000, s = 1 Cycle 2: i = 170, θ = 128.000000, s = 1 Cycle 3: i = 74, θ = 16.000000, s = 1 500 Cycle 1: i = 11, θ = 16.000000, s = 1 Cycle 2: i = 170, θ = 128.000000, s = 1 Cycle 3: i = 74, θ = 16.000000, s = 1 600 Cycle 1: i = 11, θ = 16.000000, s = 1 Cycle 2: i = 170, θ = 80.000000, s = 1 Cycle 3: i = 74, θ = 16.000000, s = 1 700 Cycle 1: i = 11, θ = 16.000000, s = 1 Cycle 2: i = 170, θ = 80.000000, s = 1 Cycle 3: i = 74, θ = 16.000000, s = 1 800 Cycle 1: i = 11, θ = 80.000000, s = 1 Cycle 2: i = 170, θ = 80.000000, s = 1 Cycle 3: i = 74, θ = 16.000000, s = 1 900 Cycle 1: i = 11, θ = 80.000000, s = 1 Cycle 2: i = 170, θ = 80.000000, s = 1 Cycle 3: i = 58, θ = 16.000000, s = 1 1000 Cycle 1: i = 11, θ = 80.000000, s = 1 Cycle 2: i = 170, θ = 80.000000, s = 1 Cycle 3: i = 58, θ = 16.000000, s = 1 run adaboost algorothm with different boosting 20 Cycle 1: i = 11, θ = 80.000000, s = 1 Cycle 2: i = 170, θ = 80.000000, s = 1 Cycle 3: i = 58, θ = 16.000000, s = 1 40 Cycle 1: i = 11, θ = 80.000000, s = 1 Cycle 2: i = 170, θ = 80.000000, s = 1 Cycle 3: i = 58, θ = 16.000000, s = 1 60 Cycle 1: i = 11, θ = 80.000000, s = 1 Cycle 2: i = 170, θ = 80.000000, s = 1 Cycle 3: i = 58, θ = 16.000000, s = 1 80 Cycle 1: i = 11, θ = 80.000000, s = 1 Cycle 2: i = 170, θ = 80.000000, s = 1 Cycle 3: i = 58, θ = 16.000000, s = 1 100 Cycle 1: i = 11, θ = 80.000000, s = 1 Cycle 2: i = 170, θ = 80.000000, s = 1 Cycle 3: i = 58, θ = 16.000000, s = 1 ``` ## Explain these decision stumps by their three parameters i, θ and s. 根據上課投影片中的公式 ![image](https://hackmd.io/_uploads/BJepBVSIC.png) * i: 代表特徵的 index (也就是選擇的特徵是哪一個) * θ: 閾值，決定了在第 i 個特徵上的分界點 (例如以氣溫 35 度做為分界點等等) * s: 方向，以這個例子來說，s 始終是 1 # 4 ## Following (3), list the blending weights of these three decision stumps 如以下程式輸出 ``` run adaboost with cycles=100 100 Cycle 1: i = 170, θ = 160.000000, s = 1, blending weights = -1.098612 Cycle 2: i = 26, θ = 192.000000, s = 1, blending weights = -1.227230 Cycle 3: i = 74, θ = 144.000000, s = 1, blending weights = 1.007370 200 Cycle 1: i = 170, θ = 128.000000, s = 1, blending weights = -0.969401 Cycle 2: i = 10, θ = 64.000000, s = 1, blending weights = -1.143421 Cycle 3: i = 29, θ = 192.000000, s = 1, blending weights = -0.826497 300 Cycle 1: i = 11, θ = 16.000000, s = 1, blending weights = -1.028715 Cycle 2: i = 170, θ = 128.000000, s = 1, blending weights = -0.935677 Cycle 3: i = 26, θ = 160.000000, s = 1, blending weights = -0.716969 400 Cycle 1: i = 11, θ = 16.000000, s = 1, blending weights = -0.944462 Cycle 2: i = 170, θ = 128.000000, s = 1, blending weights = -0.832696 Cycle 3: i = 74, θ = 16.000000, s = 1, blending weights = 0.739329 500 Cycle 1: i = 11, θ = 16.000000, s = 1, blending weights = -0.914891 Cycle 2: i = 170, θ = 128.000000, s = 1, blending weights = -0.790367 Cycle 3: i = 74, θ = 16.000000, s = 1, blending weights = 0.823993 600 Cycle 1: i = 11, θ = 16.000000, s = 1, blending weights = -0.952744 Cycle 2: i = 170, θ = 80.000000, s = 1, blending weights = -0.822025 Cycle 3: i = 74, θ = 16.000000, s = 1, blending weights = 0.819820 700 Cycle 1: i = 11, θ = 16.000000, s = 1, blending weights = -0.930333 Cycle 2: i = 170, θ = 80.000000, s = 1, blending weights = -0.742270 Cycle 3: i = 74, θ = 16.000000, s = 1, blending weights = 0.774582 800 Cycle 1: i = 11, θ = 80.000000, s = 1, blending weights = -0.950670 Cycle 2: i = 170, θ = 80.000000, s = 1, blending weights = -0.716032 Cycle 3: i = 74, θ = 16.000000, s = 1, blending weights = 0.712811 900 Cycle 1: i = 11, θ = 80.000000, s = 1, blending weights = -0.944462 Cycle 2: i = 170, θ = 80.000000, s = 1, blending weights = -0.764451 Cycle 3: i = 58, θ = 16.000000, s = 1, blending weights = 0.706830 1000 Cycle 1: i = 11, θ = 80.000000, s = 1, blending weights = -0.974422 Cycle 2: i = 170, θ = 80.000000, s = 1, blending weights = -0.732568 Cycle 3: i = 58, θ = 16.000000, s = 1, blending weights = 0.701601 run adaboost algorothm with different boosting 20 Cycle 1: i = 11, θ = 80.000000, s = 1, blending weights = -0.974422 Cycle 2: i = 170, θ = 80.000000, s = 1, blending weights = -0.732568 Cycle 3: i = 58, θ = 16.000000, s = 1, blending weights = 0.701601 40 Cycle 1: i = 11, θ = 80.000000, s = 1, blending weights = -0.974422 Cycle 2: i = 170, θ = 80.000000, s = 1, blending weights = -0.732568 Cycle 3: i = 58, θ = 16.000000, s = 1, blending weights = 0.701601 60 Cycle 1: i = 11, θ = 80.000000, s = 1, blending weights = -0.974422 Cycle 2: i = 170, θ = 80.000000, s = 1, blending weights = -0.732568 Cycle 3: i = 58, θ = 16.000000, s = 1, blending weights = 0.701601 80 Cycle 1: i = 11, θ = 80.000000, s = 1, blending weights = -0.974422 Cycle 2: i = 170, θ = 80.000000, s = 1, blending weights = -0.732568 Cycle 3: i = 58, θ = 16.000000, s = 1, blending weights = 0.701601 100 Cycle 1: i = 11, θ = 80.000000, s = 1, blending weights = -0.974422 Cycle 2: i = 170, θ = 80.000000, s = 1, blending weights = -0.732568 Cycle 3: i = 58, θ = 16.000000, s = 1, blending weights = 0.701601 ``` ## Explain how their blending weights are decided and what are their actual values in the program? blending weights 的定義便是上課投影片中的 $α_t$ = $ln(\frac{𝜀_{t}}{1-𝜀_{t}})$ ![image](https://hackmd.io/_uploads/ByQuK4BUR.png) 對應至 adaboost.m 程式中的 beta(j) 開庚號 ![image](https://hackmd.io/_uploads/BkzRKNBIC.png)