機器學習期末考

410410054 王宥愷 # 1 ## a Auto encoder 的模型架構包含一個 Encoder 模型，以及一個 Decoder 模型，兩者成對且一起訓練架構: 輸入 -> Encoder -> feature representation -> Decoder -> 輸出 * 希望輸入與輸出越接近越好 * Encoder 與 Decoder 是可被訓練的 ## b 若 Decoder 的輸出跟原始輸入越像，則 loss 越低通常選擇 mean square error 作為 Loss Function Loss: $\sum_{i=1}^n(x_i-\hat{x_i})^2$ ## c Auto Encoder 最初是作為取影像特徵的模型。該模型能將輸入轉成低維度的向量表示法，且確保低維度的向量表示法能充分表示原始圖片的特徵，以及被還原。 # 2 ## a * Recurrent Neural Network * 神經元間的連結會形成環 * 具備記憶功能 * Deep Neural Network * 神經元間的連結不會形成環 * 不具備記憶功能 RNN 能處理時序資料預測的原因是因為，該網路維護了一個 state vector，用來儲存過去歷史紀錄的資訊，使得該網路在進行當前時間點的決策時，同時會參考過去所獲得的資訊 ## b RNN 模型最大的缺點之一便是無法記得過於久遠的資訊，而 LSTM 模型在架構設計上緩解了這一問題。LSTM 能透過 forget gate 決定哪些資訊需要保留，而哪些資訊需要遺忘，進而使需要記住的資訊能存活更長時間 ## c ![image](https://hackmd.io/_uploads/r1ssNbgL0.png) * 圖中的 $W_{xh}$、$W_{hh}$、$W_{hy}$ 即為所求 ![image](https://hackmd.io/_uploads/B1b_DxgI0.png) * 圖中的 $W_f$、$W_i$、$W_C$、$W_O$ 以及 bias $b$ 即為所求 # 3 ## a Attention 的計算涉及 Q、K、V 三個向量 $Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$ * Q 代表 Query * K 代表 Key * V 代表 Value * $d_k$ 代表 Query和Key的向量長度 ## b 相較於一般的 Attention 機制，Multi-Head Attention 做了多次 Attention 函數的計算，具體而言，是先將 Q, K, V 投影到低維向量，再作 h 次的 Attention 函數；結合多通道的結果後，再作一次投影 $MultiHead(Q, K, V) = Concat(head_1, ..., head_h)W^O$ $where$ $head_1=Attention(QW_i^Q, KW_i^K, VW_i^V)$ ## c Multi-Head Attention 的原因是因為可以同時看到全域的資料，同時模擬多通道輸出的效果而使用 Mask的原因是因為避免 $Q_t$ 看到 $t$ 時間點後的資訊 ## d 由於 Transformer 使用了 Attention 的機制，將原本的 sequential 資料一次看完，故 Transformer 不是一種 sequential model ## e Transformer 使用了 Positional Encoding 的機制，為每一個位置提供一個對應的編碼 (編碼長度為 512)，並將位置編碼與每一個位置的 input embedding直接相加，便可將位置訊息直接加入資料中 # 4 ## a 這樣做的好處為，想要生成新的資料時，由於已經知道高斯分布的 mean 跟 variance，故我們可以很輕鬆的生成任意數量的向量，使模型生成出不同結果 ## b 生成對抗網路模型藉由一組生成器與判別器來實現。生成器將會接受一個向量作為輸入，並以生成目標作為輸出；而判別器則負責分辨輸入是生成的還是真實資料。透過反覆分別訓練生成器及判別器，便可以使模型成為將高斯分布轉成一個資料集資料分布的轉換函數 ## c Diffusion Model 透過 Forward 以及 Reverse 兩個過程，有效地訓練模型去除圖像中的雜訊。在 training 階段中，拿一張未受雜訊污染的原始圖像，與一張源自 Gussian distribution 的雜訊，以隨機比例混合作為輸入，使模型嘗試預測雜訊；接著拿模型的預測結果跟真實加入的雜訊來計算 loss，使模型朝著梯度下降直至收斂。在 sampling 階段中，則是逐步對目標圖像進行去噪，以還原成模型所認定的初始圖像；透過這種機制，便可以使模型成為將高斯分布轉成一個資料集資料分布的轉換函數 # 5 ## 決定根節點分類的條件 b(x) 左子樹類別 A 的數量 = 3 左子樹類別 B 的數量 = 1 左子樹 gini index = 1-( 0.75 ^2 + 0.25 ^2) 右子樹類別 A 的數量= 1 右子樹類別 B 的數量= 1 右子樹 gini index = 1-( 0.5 ^2 + 0.5 ^2) total gini_index=( 4 / 6 )* 0.375 +( 2 / 6 )* 0.5 選擇性別作為分類特徵，total gini index 為 0.41666666666666663 左子樹類別 A 的數量 = 1 左子樹類別 B 的數量 = 0 左子樹 gini index = 1-( 1.0 ^2 + 0.0 ^2) 右子樹類別 A 的數量= 3 右子樹類別 B 的數量= 2 右子樹 gini index = 1-( 0.6 ^2 + 0.4 ^2) total gini_index=( 1 / 6 )* 0.0 +( 5 / 6 )* 0.48 選擇年齡作為分類特徵，boundary 數值為 24.5 時，total gini index 為 0.4 左子樹類別 A 的數量 = 2 左子樹類別 B 的數量 = 0 左子樹 gini index = 1-( 1.0 ^2 + 0.0 ^2) 右子樹類別 A 的數量= 2 右子樹類別 B 的數量= 2 右子樹 gini index = 1-( 0.5 ^2 + 0.5 ^2) total gini_index=( 2 / 6 )* 0.0 +( 4 / 6 )* 0.5 選擇年齡作為分類特徵，boundary 數值為 31.0 時，total gini index 為 0.3333333333333333 左子樹類別 A 的數量 = 2 左子樹類別 B 的數量 = 1 左子樹 gini index = 1-( 0.6666666666666666 ^2 + 0.3333333333333333 ^2) 右子樹類別 A 的數量= 2 右子樹類別 B 的數量= 1 右子樹 gini index = 1-( 0.6666666666666666 ^2 + 0.3333333333333333 ^2) total gini_index=( 3 / 6 )* 0.4444444444444444 +( 3 / 6 )* 0.4444444444444444 選擇年齡作為分類特徵，boundary 數值為 35.0 時，total gini index 為 0.4444444444444444 左子樹類別 A 的數量 = 3 左子樹類別 B 的數量 = 1 左子樹 gini index = 1-( 0.75 ^2 + 0.25 ^2) 右子樹類別 A 的數量= 1 右子樹類別 B 的數量= 1 右子樹 gini index = 1-( 0.5 ^2 + 0.5 ^2) total gini_index=( 4 / 6 )* 0.375 +( 2 / 6 )* 0.5 選擇年齡作為分類特徵，boundary 數值為 42.5 時，total gini index 為 0.41666666666666663 左子樹類別 A 的數量 = 4 左子樹類別 B 的數量 = 1 左子樹 gini index = 1-( 0.8 ^2 + 0.2 ^2) 右子樹類別 A 的數量= 0 右子樹類別 B 的數量= 1 右子樹 gini index = 1-( 0.0 ^2 + 1.0 ^2) total gini_index=( 5 / 6 )* 0.31999999999999984 +( 1 / 6 )* 0.0 選擇年齡作為分類特徵，boundary 數值為 60.5 時，total gini index 為 0.26666666666666655 左子樹類別 A 的數量 = 0 左子樹類別 B 的數量 = 1 左子樹 gini index = 1-( 0.0 ^2 + 1.0 ^2) 右子樹類別 A 的數量= 4 右子樹類別 B 的數量= 1 右子樹 gini index = 1-( 0.8 ^2 + 0.2 ^2) total gini_index=( 1 / 6 )* 0.0 +( 5 / 6 )* 0.31999999999999984 選擇血壓作為分類特徵，boundary 數值為 0.5 時，total gini index 為 0.26666666666666655 (將血壓數值化，{'低': 0, '正常': 1, '高': 2}) 左子樹類別 A 的數量 = 2 左子樹類別 B 的數量 = 2 左子樹 gini index = 1-( 0.5 ^2 + 0.5 ^2) 右子樹類別 A 的數量= 2 右子樹類別 B 的數量= 0 右子樹 gini index = 1-( 1.0 ^2 + 0.0 ^2) total gini_index=( 4 / 6 )* 0.5 +( 2 / 6 )* 0.0 選擇血壓作為分類特徵，boundary 數值為 1.5 時，total gini index 為 0.3333333333333333 (將血壓數值化，{'低': 0, '正常': 1, '高': 2}) 最終使用年齡<=60.5 這個條件 ![image](https://hackmd.io/_uploads/SkXetDxIC.png =30%x) ## 決定左子樹分類的條件 b(x) 左子樹類別 A 的數量 = 3 左子樹類別 B 的數量 = 1 左子樹 gini index = 1-( 0.75 ^2 + 0.25 ^2) 右子樹類別 A 的數量= 1 右子樹類別 B 的數量= 0 右子樹 gini index = 1-( 1.0 ^2 + 0.0 ^2) total gini_index=( 4 / 5 )* 0.375 +( 1 / 5 )* 0.0 選擇性別作為分類特徵，total gini index 為 0.30000000000000004 左子樹類別 A 的數量 = 1 左子樹類別 B 的數量 = 0 左子樹 gini index = 1-( 1.0 ^2 + 0.0 ^2) 右子樹類別 A 的數量= 3 右子樹類別 B 的數量= 1 右子樹 gini index = 1-( 0.75 ^2 + 0.25 ^2) total gini_index=( 1 / 5 )* 0.0 +( 4 / 5 )* 0.375 選擇年齡作為分類特徵，boundary 數值為 24.5 時，total gini index 為 0.30000000000000004 左子樹類別 A 的數量 = 2 左子樹類別 B 的數量 = 0 左子樹 gini index = 1-( 1.0 ^2 + 0.0 ^2) 右子樹類別 A 的數量= 2 右子樹類別 B 的數量= 1 右子樹 gini index = 1-( 0.6666666666666666 ^2 + 0.3333333333333333 ^2) total gini_index=( 2 / 5 )* 0.0 +( 3 / 5 )* 0.4444444444444444 選擇年齡作為分類特徵，boundary 數值為 31.0 時，total gini index 為 0.26666666666666666 左子樹類別 A 的數量 = 2 左子樹類別 B 的數量 = 1 左子樹 gini index = 1-( 0.6666666666666666 ^2 + 0.3333333333333333 ^2) 右子樹類別 A 的數量= 2 右子樹類別 B 的數量= 0 右子樹 gini index = 1-( 1.0 ^2 + 0.0 ^2) total gini_index=( 3 / 5 )* 0.4444444444444444 +( 2 / 5 )* 0.0 選擇年齡作為分類特徵，boundary 數值為 35.0 時，total gini index 為 0.26666666666666666 左子樹類別 A 的數量 = 3 左子樹類別 B 的數量 = 1 左子樹 gini index = 1-( 0.75 ^2 + 0.25 ^2) 右子樹類別 A 的數量= 1 右子樹類別 B 的數量= 0 右子樹 gini index = 1-( 1.0 ^2 + 0.0 ^2) total gini_index=( 4 / 5 )* 0.375 +( 1 / 5 )* 0.0 選擇年齡作為分類特徵，boundary 數值為 42.5 時，total gini index 為 0.30000000000000004 左子樹類別 A 的數量 = 0 左子樹類別 B 的數量 = 1 左子樹 gini index = 1-( 0.0 ^2 + 1.0 ^2) 右子樹類別 A 的數量= 4 右子樹類別 B 的數量= 0 右子樹 gini index = 1-( 1.0 ^2 + 0.0 ^2) total gini_index=( 1 / 5 )* 0.0 +( 4 / 5 )* 0.0 選擇血壓作為分類特徵，boundary 數值為 0.5 時，total gini index 為 0.0 (將血壓數值化，{'低': 0, '正常': 1, '高': 2}) 左子樹類別 A 的數量 = 2 左子樹類別 B 的數量 = 1 左子樹 gini index = 1-( 0.6666666666666666 ^2 + 0.3333333333333333 ^2) 右子樹類別 A 的數量= 2 右子樹類別 B 的數量= 0 右子樹 gini index = 1-( 1.0 ^2 + 0.0 ^2) total gini_index=( 3 / 5 )* 0.4444444444444444 +( 2 / 5 )* 0.0 選擇血壓作為分類特徵，boundary 數值為 1.5 時，total gini index 為 0.26666666666666666 (將血壓數值化，{'低': 0, '正常': 1, '高': 2}) 最終使用是否為低血壓這個條件，至此，演算法停止，決策樹已被建立出來 ![image](https://hackmd.io/_uploads/r1wGFweL0.png =30%x) # 6 本題我使用程式輔助計算 ```python import numpy as np # 設定數據集 X = np.array([[2, 1], [1, 3], [4, 7], [5, 6], [3, 3]]) # 特徵 y = np.array([1, -1, 1, -1, 1]) # 標籤 # 初始化權重 N = len(y) w = np.ones(N) / N # 初始權重均等，每個樣本的權重為 1/N class DecisionStump: def __init__(self): self.polarity = 1 # s self.feature_index = None # Xi self.threshold = None # theta def predict(self, X): n_samples = X.shape[0] X_column = X[:, self.feature_index] # 取得特定特徵的所有樣本值 predictions = np.ones(n_samples) if self.polarity == 1: predictions[X_column < self.threshold] = -1 else: predictions[X_column >= self.threshold] = -1 return predictions def fit(self, X, y, w): n_samples, n_features = X.shape min_error = float('inf') # 初始化最小錯誤率為無限大 for feature_index in range(n_features): # 遍歷所有特徵 X_column = X[:, feature_index] thresholds = np.unique(X_column) # 所有可能的閾值，thresholds 切分點為中間值 thresholds = (thresholds[:-1] + thresholds[1:]) / 2 for threshold in thresholds: # 遍歷所有閾值 for polarity in [1, -1]: # 遍歷所有極性 predictions = np.ones(n_samples) # 初始化預測結果 if polarity == 1: predictions[X_column < threshold] = -1 else: predictions[X_column >= threshold] = -1 error = sum(w * (predictions != y)) # 使用 u-weight 0/1 error 計算 if error < min_error: # 更新最小錯誤率 self.polarity = polarity # 更新極性 self.threshold = threshold # 更新閾值 self.feature_index = feature_index # 更新特徵 min_error = error # 更新最小錯誤率 return min_error class AdaBoost: def __init__(self, n_clf=3): self.n_clf = n_clf # 弱分類器數量 self.clfs = [] # 存放所有的弱分類器 def fit(self, X, y): n_samples, n_features = X.shape # 取得樣本數和特徵數 w = np.ones(n_samples) / n_samples # 初始化權重 self.alphas = [] # 存放每個弱分類器的 alpha 值 for t in range(self.n_clf): clf = DecisionStump() error = clf.fit(X, y, w) # 訓練弱分類器並獲得錯誤率，參數分別為特徵、標籤和權重 EPS = 1e-10 # 避免除以 0 factor = np.sqrt((1 - error) / (error + EPS)) alpha = np.log(factor) # 計算 alpha 值 predictions = clf.predict(X) # 弱分類器的預測結果 # 更新權重 w[y == predictions] /= factor # 正確樣本權重更新 w[y != predictions] *= factor # 錯誤樣本權重更新 self.clfs.append(clf) # 存放弱分類器 self.alphas.append(alpha) # 存放 alpha 值 # 印出所有的弱分類器以及弱分類器對應的權重 print(f"Iteration {t + 1}:") print(f"弱分類器 {t + 1}:") print(f" 特徵: {clf.feature_index}") print(f" 閾值: {clf.threshold}") print(f" 極性: {clf.polarity}") print(f"該弱分類器對應的權重: alpha: {alpha}") # 印出每一個 iteration 計算的過程 print(f"錯誤率: {error}") print(f"◆t: {factor}") print(f"資料權重: {w}") print("\n") # 所有弱分類器的預測加權總和 def predict(self, X): clf_preds = [alpha * clf.predict(X) for clf, alpha in zip(self.clfs, self.alphas)] y_pred = np.sum(clf_preds, axis=0) y_pred = np.sign(y_pred) # 取符號作為最終預測結果 return y_pred # 執行 Adaboost print("初始資料權重: ", w, "\n") clf = AdaBoost(n_clf=3) # n_clf 代表弱分類器的數量 clf.fit(X, y) print("(弱分類器特徵 = 0 代表第一個特徵, 特徵 = 1 代表第二個特徵)") # 測試數據集 predictions = clf.predict(X) print(f"Final predictions: {predictions}") ``` ``` 初始資料權重: [0.2 0.2 0.2 0.2 0.2] Iteration 1: 弱分類器 1: 特徵: 0 閾值: 1.5 極性: 1 該弱分類器對應的權重: alpha: 0.6931471803099453 錯誤率: 0.2 ◆t: 1.9999999995 資料權重: [0.1 0.1 0.1 0.4 0.1] Iteration 2: 弱分類器 2: 特徵: 0 閾值: 4.5 極性: -1 該弱分類器對應的權重: alpha: 1.0986122880292208 錯誤率: 0.10000000002500001 ◆t: 2.9999999980833336 資料權重: [0.03333333 0.3 0.03333333 0.13333333 0.03333333] Iteration 3: 弱分類器 3: 特徵: 1 閾值: 2.0 極性: -1 該弱分類器對應的權重: alpha: 1.3195286635814387 錯誤率: 0.06666666672592593 ◆t: 3.7416573821859567 資料權重: [0.00890871 0.08017837 0.12472191 0.03563483 0.12472191] (弱分類器特徵 = 0 代表第一個特徵, 特徵 = 1 代表第二個特徵) Final predictions: [ 1. -1. 1. -1. 1.] ``` # 7 ## Q $Q=\begin{bmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}$ ## p $p=\begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}$ ## A $a_1^T=-1\begin{bmatrix} 1 & 0 & 0 \end{bmatrix}=\begin{bmatrix} -1 & 0 & 0 \end{bmatrix}$ $a_2^T=-1\begin{bmatrix} 1 & 2 & 2 \end{bmatrix}=\begin{bmatrix} -1 & -2 & -2 \end{bmatrix}$ $a_3^T=1\begin{bmatrix} 1 & 2 & 0 \end{bmatrix}=\begin{bmatrix} 1 & 2 & 0 \end{bmatrix}$ $a_4^T=1\begin{bmatrix} 1 & 3 & 0 \end{bmatrix}=\begin{bmatrix} 1 & 3 & 0 \end{bmatrix}$ $A=\begin{bmatrix} -1 & 0 & 0 \\ -1 & -2 & -2 \\ 1 & 2 & 0 \\ 1 & 3 & 0 \end{bmatrix}$ ## c $c_n=1$ $c=\begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}$