NTU 機器學習 HW1

NTU 機器學習 HW1 === ## 1. Logistic Regression ### 1-(a) #### Question ![](https://i.imgur.com/qAHxah2.png) #### Answer \begin{equation} σ(-1*7 + 2*0 + 3*-1 + 10*5 + 1)\\ = σ(41)\\ = \dfrac{1}{1+e^{-41}} \end{equation} ### 1-(b) #### Question ![](https://i.imgur.com/SbOsCcl.png) #### Answer \begin{equation} L(w, b) = f_{w,b}(x^1)f_{w,b}(x^2)(1-f_{w,b}(x^3))...f_{w,b}(x^N)\\ w^*, b^* = arg\ max\ L(w,b) => w^*, b^* = arg\ min\ -lnL(w,b)\\ -lnL(w, b) = -[\hat{y}^1 * lnf_{w,b}(x^1) + \hat{y}^2 * lnf_{w,b}(x^2) + (1-f_{w,b}(x^3)) * lnf_{w,b}(x^3) + ... + \hat{y}^N * lnf_{w,b}(x^N)]\\ = \sum_{n}-[\hat{y}^n * lnf_{w,b}(x^n) + (1-\hat{y}^n) * ln(1-f_{w,b}(x^n))]\\ \end{equation} ### 1-$c$ #### Question ![](https://i.imgur.com/PCv8VAP.png) #### Answer \begin{equation} \dfrac{∂-lnL(w,b)}{∂\ w_i} = \dfrac{∂\sum_{n}-[\hat{y}^n * lnf_{w,b}(x^n) + (1-\hat{y}^n) * ln(1-f_{w,b}(x^n))]}{∂w_i}\\ = \sum_{n}-[\hat{y}^n(1-f_{w,b}(x^n))x_i^n - (1-\hat{y}^n)f_{w,b}(x^n)x_i^n]\\ = \sum_{n}-(\hat{y}^n-f_{w,b}(x^n))x_i^n \end{equation} \begin{equation} w_i\ \ \text{<-}\ \ w_i - η\sum_{n}-(\hat{y}^n - f_{w,b}(x^n))x_i^n \end{equation} ## 2. Closed-Form Linear Regression Solution ![](https://i.imgur.com/EaHs6bn.png) ### 2-(a) #### Question ![](https://i.imgur.com/LsG8J5F.png) #### Answer Let $w_0=b$ and $x_{i,0}=1$, so w is a 2d vector and x is a 2*N Matrix \begin{equation} L(w) = \dfrac{1}{10}(xw - y)^2\\ = \dfrac{1}{10}(w^Tx^Txw - 2w^Tx^Ty + y^Ty)\\ \end{equation} Partial differentiate w, then set the result to 0. \begin{equation} \dfrac{∂L}{∂w} = \dfrac{1}{5}(x^Txw - x^Ty) = 0\\ w = (x^Tx)^{-1}x^Ty \end{equation} Calculate the pratical value. \begin{equation} (\left( \begin{array}{ccc} 1 & 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 & 5 \\ \end{array} \right) \left( \begin{array}{ccc} 1 & 1 \\ 1 & 2 \\ 1 & 3 \\ 1 & 4 \\ 1 & 5 \\ \end{array} \right))^{-1} \left( \begin{array}{ccc} 1 & 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 & 5 \\ \end{array} \right) \left( \begin{array}{ccc} 1.5 \\ 2.4 \\ 3.5 \\ 4.1 \\ 5.4 \\ \end{array} \right)\\ = \left( \begin{array}{ccc} \dfrac{11}{10} & \dfrac{-3}{10} \\ \dfrac{-3}{10} & \dfrac{1}{10} \\ \end{array} \right) \left( \begin{array}{ccc} 16.8 \\ 59.7 \\ \end{array} \right)\\ = \left( \begin{array}{ccc} 0.57 \\ 0.93 \\ \end{array} \right) \end{equation} \begin{equation} w = 0.93, b = 0.57 \end{equation} ### 2-(b) #### Question ![](https://i.imgur.com/JPUjSm6.png) #### Answer According 2-(a) \begin{equation} w = (x^Tx)^{-1}x^Ty \end{equation} Take all of the x, y, N into the formula. \begin{equation} (\left( \begin{array}{ccc} 1 & 1 & ... & 1 \\ x_1 & x_2 & ... & x_N\\ \end{array} \right) \left( \begin{array}{ccc} 1 & x_1 \\ 1 & x_2 \\ ... & ... \\ 1 & x_N \\ \end{array} \right))^{-1} \left( \begin{array}{ccc} 1 & 1 & ... & 1 \\ x_1 & x_2 & ... & x_N\\ \end{array} \right) \left( \begin{array}{ccc} y_1 \\ y_2 \\ ... \\ y_N \\ \end{array} \right) \\ = \dfrac{1}{N(x_1^2 + ... + x_N^2) - (x_1 + ... + x_N)^2} \left( \begin{array}{ccc} x_1^2 + ... + x_N^2 & -(x_1 + ... + x_N) \\ -(x_1 + ... + x_N) & N \\ \end{array} \right) \left( \begin{array}{ccc} y_1 + ... + y_N \\ x_1y_1 + ... + x_Ny_N \end{array} \right) \\ = \dfrac{ \left( \begin{array}{ccc} \sum_{i=1}^{N}x_i^2 \sum_{i=1}^Ny_i - \sum_{i=1}^Nx_i\sum_{i=1}^Nx_iy_i \\ N\sum_{i=1}^{N}(x_iy_i) - \sum_{i=1}^{N}x_i\sum_{i=1}^{N}y_i \end{array} \right) }{N\sum_{i=1}^Nx_i^2 - (\sum_{i=1}^Nx_i)^2} \end{equation} \begin{equation} w = \dfrac{ N\sum_{i=1}^{N}(x_iy_i) - \sum_{i=1}^{N}x_i\sum_{i=1}^{N}y_i }{N\sum_{i=1}^Nx_i^2 - (\sum_{i=1}^Nx_i)^2} ,\ b = \dfrac{ \sum_{i=1}^{N}x_i^2 \sum_{i=1}^Ny_i - \sum_{i=1}^Nx_i\sum_{i=1}^Nx_iy_i }{N\sum_{i=1}^Nx_i^2 - (\sum_{i=1}^Nx_i)^2} \end{equation} ### 2-$c$ #### Question ![](https://i.imgur.com/9S1IryR.png) #### Answer Let $w_0=b$, $x_{i,0}=1$, $λ' = \left(\begin{array}{ccc}0&0\\0&λ\end{array}\right)$, so w is a 2d vector and x is a 2*N Matrix \begin{equation} L(w) = \dfrac{1}{2N}(xw - y)^2 + \dfrac{λ'}{2}w^2\\ = \dfrac{1}{2N}(w^Tx^Txw - 2w^Tx^Ty + y^Ty) + \dfrac{λ'}{2}w^2\\ \end{equation} Partial differentiate w, then set the result to 0. \begin{equation} \dfrac{∂L}{∂w} = \dfrac{1}{N}(x^Txw - x^Ty) + λ'w = 0\\ w = (x^Tx + Nλ')^{-1}x^Ty \end{equation} Similar to 2-(b), we can take all of the x, y, λ' and N into the formula and get w and b. \begin{equation} w = \dfrac{ N\sum_{i=1}^{N}(x_iy_i) - \sum_{i=1}^{N}x_i\sum_{i=1}^{N}y_i }{N\sum_{i=1}^Nx_i^2 + N^2 λ - (\sum_{i=1}^Nx_i)^2} ,\ b = \dfrac{ (\sum_{i=1}^{N}x_i^2 + Nλ) \sum_{i=1}^Ny_i - \sum_{i=1}^Nx_i\sum_{i=1}^Nx_iy_i }{N\sum_{i=1}^Nx_i^2 + N^2 λ - (\sum_{i=1}^Nx_i)^2} \end{equation} ## 3. Noise and regulation #### Question ![](https://i.imgur.com/uBmpQlo.png) #### Answer Let $w_0=b$ and $x_{i,0}=1$。 Expand the $\hat{L}_{ssq}$ \begin{equation} (f_{w,b}(x_i + η_i) - y_i)^2 \\= w^T (x_i + η_i)^T (x_i + η_i) w - 2w^T (x_i + η_i)^T y_i + y_i^2 \\= w^Tx^Tx_iw + w^Tx_i^Tη_iw + w^Tη_i^Tx_iw + w^Tη_i^Tη_iw - 2w^Tx_i^Ty_i - 2w^Tη_i^Ty_i + y_i^2 \\= (w^Tx^Tx_iw - 2w^Tx_i^Ty_i + y_i^2) + (w^Tx_i^Tη_iw + w^Tη_i^Tx_iw + w^Tη_i^Tη_iw - 2w^Tη_i^Ty_i) \\= (f_{w,b}(x_i) - y_i)^2 + w^Tη_i^Tη_iw \end{equation} Note that the $η_i^2$ is $σ^2$. \begin{equation} \dfrac{1}{2N}\sum_{i=1}^N((f_{w,b}(x_i) - y_i)^2 + w^Tη_i^Tη_iw) \\= \dfrac{1}{2N}\sum_{i=1}^N((f_{w,b}(x_i) - y_i)^2 + \dfrac{σ^2}{2}w^2 \end{equation} ## 4. Kaggle Hacker ![](https://i.imgur.com/h6ZL5q0.png) ### 4-(a) #### Question ![](https://i.imgur.com/Et3jBbU.png) #### Asnwer Expand the $e_k$ \begin{equation} e_k = \dfrac{1}{N}\sum_{i=1}^{N}(g_k(x_i)-y_i)^2 \\= \dfrac{1}{N}\sum_{i=1}^{N}(g_k(x_i)^2 - 2g_k(x_i)y_i + y_i^2) \\= s_k - \dfrac{1}{N}\sum_{i=1}^{N}(2g_k(x_i)y_i) + e_0 \end{equation} After transposition, we can get $e_k$ \begin{equation} \sum_{i=1}^{N}(g_k(x_i)y_i) = \dfrac{s_k - e_k + e_0}{2} \end{equation} ### 4-(b) #### Question ![](https://i.imgur.com/01i5lfD.png) #### Answer We can use gradient descent to get the optimal weight. Here we go with updating $α_1$. First, partial differentiate the formula $\sum_{k=1}^K(α_kg_k(x_i)-y_i)^2$ \begin{equation} \dfrac{∂(α_1g_1(x_i) + α_2g_2(x_i) + ... + α_kg_k(x_i) - y_i)^2}{∂α_1} \\= 2(α_1g_1(x_i) + α_2g_2(x_i) + ... + α_kg_k(x_i) - y_i) + g_1(x_i) \\= 2\sum_{k=1}^{K}(α_kg_k(x_i)) - 2y_i + g_1(x_i) \end{equation} And sum up all of the $x_i$. \begin{equation} gradient = \dfrac{1}{N}\sum_{i=1}^N(2\sum_{k=1}^{K}(α_kg_k(x_i)) - 2y_i + g_1(x_i)) \end{equation} Last, we can update $α_1$ by minus gradient with learning rate η. \begin{equation} α_1 = α_1 - η * gradient \end{equation} ## 參考資料 * [1: 李弘毅的 YT 影片](https://www.youtube.com/watch?v=hSXFuypLukA&ab_channel=Hung-yiLee) * [2-a: Closed-Form Solution](https://www.cs.toronto.edu/~rgrosse/courses/csc311_f20/readings/notes_on_linear_regression.pdf)