NTU 機器學習 HW1
===
## 1. Logistic Regression
### 1-(a)
#### Question

#### Answer
\begin{equation}
σ(-1*7 + 2*0 + 3*-1 + 10*5 + 1)\\
= σ(41)\\
= \dfrac{1}{1+e^{-41}}
\end{equation}
### 1-(b)
#### Question

#### Answer
\begin{equation}
L(w, b) = f_{w,b}(x^1)f_{w,b}(x^2)(1-f_{w,b}(x^3))...f_{w,b}(x^N)\\
w^*, b^* = arg\ max\ L(w,b) => w^*, b^* = arg\ min\ -lnL(w,b)\\
-lnL(w, b) = -[\hat{y}^1 * lnf_{w,b}(x^1) + \hat{y}^2 * lnf_{w,b}(x^2) + (1-f_{w,b}(x^3)) * lnf_{w,b}(x^3) + ... + \hat{y}^N * lnf_{w,b}(x^N)]\\
= \sum_{n}-[\hat{y}^n * lnf_{w,b}(x^n) + (1-\hat{y}^n) * ln(1-f_{w,b}(x^n))]\\
\end{equation}
### 1-\(c\)
#### Question

#### Answer
\begin{equation}
\dfrac{∂-lnL(w,b)}{∂\ w_i} = \dfrac{∂\sum_{n}-[\hat{y}^n * lnf_{w,b}(x^n) + (1-\hat{y}^n) * ln(1-f_{w,b}(x^n))]}{∂w_i}\\
= \sum_{n}-[\hat{y}^n(1-f_{w,b}(x^n))x_i^n - (1-\hat{y}^n)f_{w,b}(x^n)x_i^n]\\
= \sum_{n}-(\hat{y}^n-f_{w,b}(x^n))x_i^n
\end{equation}
\begin{equation}
w_i\ \ \text{<-}\ \ w_i - η\sum_{n}-(\hat{y}^n - f_{w,b}(x^n))x_i^n
\end{equation}
## 2. Closed-Form Linear Regression Solution

### 2-(a)
#### Question

#### Answer
Let $w_0=b$ and $x_{i,0}=1$, so w is a 2d vector and x is a 2*N Matrix
\begin{equation}
L(w) = \dfrac{1}{10}(xw - y)^2\\
= \dfrac{1}{10}(w^Tx^Txw - 2w^Tx^Ty + y^Ty)\\
\end{equation}
Partial differentiate w, then set the result to 0.
\begin{equation}
\dfrac{∂L}{∂w} = \dfrac{1}{5}(x^Txw - x^Ty) = 0\\
w = (x^Tx)^{-1}x^Ty
\end{equation}
Calculate the pratical value.
\begin{equation}
(\left(
\begin{array}{ccc}
1 & 1 & 1 & 1 & 1 \\
1 & 2 & 3 & 4 & 5 \\
\end{array}
\right)
\left(
\begin{array}{ccc}
1 & 1 \\
1 & 2 \\
1 & 3 \\
1 & 4 \\
1 & 5 \\
\end{array}
\right))^{-1}
\left(
\begin{array}{ccc}
1 & 1 & 1 & 1 & 1 \\
1 & 2 & 3 & 4 & 5 \\
\end{array}
\right)
\left(
\begin{array}{ccc}
1.5 \\
2.4 \\
3.5 \\
4.1 \\
5.4 \\
\end{array}
\right)\\ =
\left(
\begin{array}{ccc}
\dfrac{11}{10} & \dfrac{-3}{10} \\
\dfrac{-3}{10} & \dfrac{1}{10} \\
\end{array}
\right)
\left(
\begin{array}{ccc}
16.8 \\
59.7 \\
\end{array}
\right)\\ =
\left(
\begin{array}{ccc}
0.57 \\
0.93 \\
\end{array}
\right)
\end{equation}
\begin{equation}
w = 0.93, b = 0.57
\end{equation}
### 2-(b)
#### Question

#### Answer
According 2-(a)
\begin{equation}
w = (x^Tx)^{-1}x^Ty
\end{equation}
Take all of the x, y, N into the formula.
\begin{equation}
(\left(
\begin{array}{ccc}
1 & 1 & ... & 1 \\
x_1 & x_2 & ... & x_N\\
\end{array}
\right)
\left(
\begin{array}{ccc}
1 & x_1 \\
1 & x_2 \\
... & ... \\
1 & x_N \\
\end{array}
\right))^{-1}
\left(
\begin{array}{ccc}
1 & 1 & ... & 1 \\
x_1 & x_2 & ... & x_N\\
\end{array}
\right)
\left(
\begin{array}{ccc}
y_1 \\
y_2 \\
... \\
y_N \\
\end{array}
\right) \\ =
\dfrac{1}{N(x_1^2 + ... + x_N^2) - (x_1 + ... + x_N)^2}
\left(
\begin{array}{ccc}
x_1^2 + ... + x_N^2 & -(x_1 + ... + x_N) \\
-(x_1 + ... + x_N) & N \\
\end{array}
\right)
\left(
\begin{array}{ccc}
y_1 + ... + y_N \\
x_1y_1 + ... + x_Ny_N
\end{array}
\right) \\ =
\dfrac{
\left(
\begin{array}{ccc}
\sum_{i=1}^{N}x_i^2 \sum_{i=1}^Ny_i - \sum_{i=1}^Nx_i\sum_{i=1}^Nx_iy_i \\
N\sum_{i=1}^{N}(x_iy_i) - \sum_{i=1}^{N}x_i\sum_{i=1}^{N}y_i
\end{array}
\right)
}{N\sum_{i=1}^Nx_i^2 - (\sum_{i=1}^Nx_i)^2}
\end{equation}
\begin{equation}
w = \dfrac{
N\sum_{i=1}^{N}(x_iy_i) - \sum_{i=1}^{N}x_i\sum_{i=1}^{N}y_i
}{N\sum_{i=1}^Nx_i^2 - (\sum_{i=1}^Nx_i)^2}
,\ b = \dfrac{
\sum_{i=1}^{N}x_i^2 \sum_{i=1}^Ny_i - \sum_{i=1}^Nx_i\sum_{i=1}^Nx_iy_i
}{N\sum_{i=1}^Nx_i^2 - (\sum_{i=1}^Nx_i)^2}
\end{equation}
### 2-\(c\)
#### Question

#### Answer
Let $w_0=b$, $x_{i,0}=1$, $λ' = \left(\begin{array}{ccc}0&0\\0&λ\end{array}\right)$, so w is a 2d vector and x is a 2*N Matrix
\begin{equation}
L(w) = \dfrac{1}{2N}(xw - y)^2 + \dfrac{λ'}{2}w^2\\
= \dfrac{1}{2N}(w^Tx^Txw - 2w^Tx^Ty + y^Ty) + \dfrac{λ'}{2}w^2\\
\end{equation}
Partial differentiate w, then set the result to 0.
\begin{equation}
\dfrac{∂L}{∂w} = \dfrac{1}{N}(x^Txw - x^Ty) + λ'w = 0\\
w = (x^Tx + Nλ')^{-1}x^Ty
\end{equation}
Similar to 2-(b), we can take all of the x, y, λ' and N into the formula and get w and b.
\begin{equation}
w = \dfrac{
N\sum_{i=1}^{N}(x_iy_i) - \sum_{i=1}^{N}x_i\sum_{i=1}^{N}y_i
}{N\sum_{i=1}^Nx_i^2 + N^2 λ - (\sum_{i=1}^Nx_i)^2}
,\ b = \dfrac{
(\sum_{i=1}^{N}x_i^2 + Nλ) \sum_{i=1}^Ny_i - \sum_{i=1}^Nx_i\sum_{i=1}^Nx_iy_i
}{N\sum_{i=1}^Nx_i^2 + N^2 λ - (\sum_{i=1}^Nx_i)^2}
\end{equation}
## 3. Noise and regulation
#### Question

#### Answer
Let $w_0=b$ and $x_{i,0}=1$。
Expand the $\hat{L}_{ssq}$
\begin{equation}
(f_{w,b}(x_i + η_i) - y_i)^2 \\=
w^T (x_i + η_i)^T (x_i + η_i) w - 2w^T (x_i + η_i)^T y_i + y_i^2 \\=
w^Tx^Tx_iw + w^Tx_i^Tη_iw + w^Tη_i^Tx_iw + w^Tη_i^Tη_iw - 2w^Tx_i^Ty_i - 2w^Tη_i^Ty_i + y_i^2 \\=
(w^Tx^Tx_iw - 2w^Tx_i^Ty_i + y_i^2) + (w^Tx_i^Tη_iw + w^Tη_i^Tx_iw + w^Tη_i^Tη_iw - 2w^Tη_i^Ty_i) \\=
(f_{w,b}(x_i) - y_i)^2 + w^Tη_i^Tη_iw
\end{equation}
Note that the $η_i^2$ is $σ^2$.
\begin{equation}
\dfrac{1}{2N}\sum_{i=1}^N((f_{w,b}(x_i) - y_i)^2 + w^Tη_i^Tη_iw) \\=
\dfrac{1}{2N}\sum_{i=1}^N((f_{w,b}(x_i) - y_i)^2 + \dfrac{σ^2}{2}w^2
\end{equation}
## 4. Kaggle Hacker

### 4-(a)
#### Question

#### Asnwer
Expand the $e_k$
\begin{equation}
e_k = \dfrac{1}{N}\sum_{i=1}^{N}(g_k(x_i)-y_i)^2 \\=
\dfrac{1}{N}\sum_{i=1}^{N}(g_k(x_i)^2 - 2g_k(x_i)y_i + y_i^2) \\=
s_k - \dfrac{1}{N}\sum_{i=1}^{N}(2g_k(x_i)y_i) + e_0
\end{equation}
After transposition, we can get $e_k$
\begin{equation}
\sum_{i=1}^{N}(g_k(x_i)y_i) = \dfrac{s_k - e_k + e_0}{2}
\end{equation}
### 4-(b)
#### Question

#### Answer
We can use gradient descent to get the optimal weight. Here we go with updating $α_1$.
First, partial differentiate the formula $\sum_{k=1}^K(α_kg_k(x_i)-y_i)^2$
\begin{equation}
\dfrac{∂(α_1g_1(x_i) + α_2g_2(x_i) + ... + α_kg_k(x_i) - y_i)^2}{∂α_1} \\=
2(α_1g_1(x_i) + α_2g_2(x_i) + ... + α_kg_k(x_i) - y_i) + g_1(x_i) \\=
2\sum_{k=1}^{K}(α_kg_k(x_i)) - 2y_i + g_1(x_i)
\end{equation}
And sum up all of the $x_i$.
\begin{equation}
gradient = \dfrac{1}{N}\sum_{i=1}^N(2\sum_{k=1}^{K}(α_kg_k(x_i)) - 2y_i + g_1(x_i))
\end{equation}
Last, we can update $α_1$ by minus gradient with learning rate η.
\begin{equation}
α_1 = α_1 - η * gradient
\end{equation}
## 參考資料
* [1: 李弘毅的 YT 影片](https://www.youtube.com/watch?v=hSXFuypLukA&ab_channel=Hung-yiLee)
* [2-a: Closed-Form Solution](https://www.cs.toronto.edu/~rgrosse/courses/csc311_f20/readings/notes_on_linear_regression.pdf)