NTU 機器學習 HW4
===
## 1.
### Question

### Answer
Document 的生成機率
\begin{equation}
p(w) = p(w_1)^{c(w_1)}p(w_2)^{c(w_2)}...p(w_n)^{c(w_n)}
\end{equation}
對兩邊取 log
\begin{equation}
logp(w) = c(w_1)logp(w_1) + c(w_2)logp(w_2) + ... + c(w_n)logp(w_n)
\end{equation}
目標是最大化 p(w),用 Lagrange Multiplier 創出一個等價函數
\begin{equation}
L = \sum_{i=1}^{n}c(w_i)logp(w_i) + λ(\sum_{i=1}^{n}p(w_i) - 1)
\end{equation}
L 對 $p(w_j)$ 取偏微分
\begin{equation}
\dfrac{∂L}{∂p(w_j)} = \dfrac{c(w_j)}{p(w_j)} + λ = 0 \\
p(w_j) = -\dfrac{c(w_j)}{λ} \\
\sum_{i=1}^np(w_i) = \sum_{i=1}^n-\dfrac{c(w_i)}{λ} = 1 \\
λ = \sum_{i=1}^n{c(w_i)}
\end{equation}
因為 $p(w_j)=-\dfrac{c(w_j)}{λ}$,所以 $p(w_i)=\dfrac{c(w_j)}{\sum_{i=1}^nc(w_i)}$
## 2.
### Question

### Answer
#### t=1
\begin{equation}
z = 0 * 0 + 0 * 1 + 0 * 0 + 1 * 3 + 0 = 3 \\
z_i = 100 * 0 + 100 * 1 + 0 * 0 + 0 * 3 - 10 = 90 \\
z_f = -100 * 0 + -100 * 1 + 0 * 0 + 0 * 3 + 110 = 10 \\
z_o = 0 * 0 + 0 * 1 + 100 * 0 + 0 * 3 - 10 = -10 \\
f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\
f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\
f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 0 \\
c' = 1 * 3 + 0 * 1 = 3 \\
y = 0 * 3 = 0
\end{equation}
#### t=2
\begin{equation}
z = 0 * 1 + 0 * 0 + 0 * 1 + 1 * -2 + 0 = -2 \\
z_i = 100 * 1 + 100 * 0 + 0 * 1 + 0 * -2 - 10 = 90 \\
z_f = -100 * 1 + -100 * 0 + 0 * 1 + 0 * -2 + 110 = 10 \\
z_o = 0 * 1 + 0 * 0 + 100 * 1 + 0 * -2 - 10 = 90 \\
f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\
f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\
f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\
c' = 1 * -2 + 3 * 1 = 1 \\
y = 1 * 1 = 1
\end{equation}
#### t=3
\begin{equation}
z = 0 * 1 + 0 * 1 + 0 * 1 + 1 * 4 + 0 = 4 \\
z_i = 100 * 1 + 100 * 1 + 0 * 1 + 0 * 4 - 10 = 190 \\
z_f = -100 * 1 + -100 * 1 + 0 * 1 + 0 * 4 + 110 = -90 \\
z_o = 0 * 1 + 0 * 1 + 100 * 1 + 0 * 4 - 10 = 90 \\
f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\
f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 0 \\
f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\
c' = 1 * 4 + 1 * 0 = 4 \\
y = 1 * 4 = 4
\end{equation}
#### t=4
\begin{equation}
z = 0 * 0 + 0 * 1 + 0 * 1 + 1 * 0 + 0 = 0 \\
z_i = 100 * 0 + 100 * 1 + 0 * 1 + 0 * 0 - 10 = 90 \\
z_f = -100 * 0 + -100 * 1 + 0 * 1 + 0 * 0 + 110 = 10 \\
z_o = 0 * 0 + 0 * 1 + 100 * 1 + 0 * 0 - 10 = 90 \\
f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\
f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\
f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\
c' = 1 * 0 + 4 * 1 = 4 \\
y = 1 * 4 = 4
\end{equation}
#### t=5
\begin{equation}
z = 0 * 0 + 0 * 1 + 0 * 0 + 1 * 2 + 0 = 2 \\
z_i = 100 * 0 + 100 * 1 + 0 * 0 + 0 * 2 - 10 = 90 \\
z_f = -100 * 0 + -100 * 1 + 0 * 0 + 0 * 2 + 110 = 10 \\
z_o = 0 * 0 + 0 * 1 + 100 * 0 + 0 * 2 - 10 = -10 \\
f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\
f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\
f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 0 \\
c' = 1 * 2 + 4 * 1 = 6 \\
y = 0 * 6 = 0
\end{equation}
#### t=6
\begin{equation}
z = 0 * 0 + 0 * 0 + 0 * 1 + 1 * -4 + 0 = -4 \\
z_i = 100 * 0 + 100 * 0 + 0 * 1 + 0 * -4 - 10 = -10 \\
z_f = -100 * 0 + -100 * 0 + 0 * 1 + 0 * -4 + 110 = 110 \\
z_o = 0 * 0 + 0 * 0 + 100 * 1 + 0 * -4 - 10 = 90 \\
f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 0 \\
f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\
f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\
c' = 0 * -4 + 6 * 1 = 6 \\
y = 1 * 6 = 6
\end{equation}
#### t=7
\begin{equation}
z = 0 * 1 + 0 * 1 + 0 * 1 + 1 * 1 + 0 = 1 \\
z_i = 100 * 1 + 100 * 1 + 0 * 1 + 0 * 1 - 10 = 190 \\
z_f = -100 * 1 + -100 * 1 + 0 * 1 + 0 * 1 + 110 = -90 \\
z_o = 0 * 1 + 0 * 1 + 100 * 1 + 0 * 1 - 10 = 90 \\
f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\
f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 0 \\
f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\
c' = 1 * 1 + 6 * 0 = 1 \\
y = 1 * 1 = 1
\end{equation}
#### t=8
\begin{equation}
z = 0 * 1 + 0 * 0 + 0 * 1 + 1 * 2 + 0 = 2 \\
z_i = 100 * 1 + 100 * 0 + 0 * 1 + 0 * 2 - 10 = 90 \\
z_f = -100 * 1 + -100 * 0 + 0 * 1 + 0 * 2 + 110 = 10 \\
z_o = 0 * 1 + 0 * 0 + 100 * 1 + 0 * 2 - 10 = 90 \\
f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\
f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\
f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\
c' = 1 * 2 + 1 * 1 = 3 \\
y = 1 * 3 = 3
\end{equation}
## 3.
### Question

### Answer
設 $W_oh_2 = z$
#### 求 $\dfrac{∂L(y,\hat{y})}{∂W_o}$
\begin{equation}
\dfrac{∂L(y,\hat{y})}{∂W_o} = \dfrac{∂L(y,\hat{y})}{∂z_j}\dfrac{∂z_j}{∂W_o} \\
= -\dfrac{∂\sum_{i}y_ilog(σ(W_oh_2)_j)}{∂z_j} * h_2 \\
= (\sum_{i=1}y_iσ(W_oh_2)_j - y_j) * h_2 \\
= (σ(W_oh_2) - y) * h_2
\end{equation}
#### 求 $\dfrac{∂L(y,\hat{y})}{∂W_i}$
\begin{equation}
\dfrac{∂L(y,\hat{y})}{∂W_i} = \dfrac{∂L(y,\hat{y})}{∂z}\dfrac{∂z}{∂h_2}\dfrac{∂h_2}{∂w_i}
\end{equation}
代入上面的結果,$\dfrac{∂L(y,\hat{y})}{∂z}=(σ(W_oh_2) - y)$,$\dfrac{∂z}{∂h_2} = W_o$,所以要算的只有 $\dfrac{∂h_2}{∂w_i}$。
\begin{equation}
\dfrac{∂h_2}{∂w_i} = \dfrac{∂tanh(W_ix_2 + W_hh_1)}{∂W_i} \\
= x_2(1-tanh^2(W_ix_2 + W_hh_1))
\end{equation}
結合前面三項可得 $\dfrac{∂L(y,\hat{y})}{∂W_i}=x_2W_o(σ(W_oh_2) - y)(1-tanh^2(W_ix_2 + W_hh_1))$
#### 求 $\dfrac{∂L(y,\hat{y})}{∂W_h}$
\begin{equation}
\dfrac{∂L(y,\hat{y})}{∂W_h} = \dfrac{∂L(y,\hat{y})}{∂z}\dfrac{∂z}{∂h_2}\dfrac{∂h_2}{∂w_h}
\end{equation}
代入上面的結果,$\dfrac{∂L(y,\hat{y})}{∂z}=(σ(W_oh_2) - y)$,$\dfrac{∂z}{∂h_2} = W_o$,所以要算的只有 $\dfrac{∂h_2}{∂w_h}$。
\begin{equation}
\dfrac{∂h_2}{∂w_h} = \dfrac{∂tanh(W_ix_2 + W_hh_1)}{∂W_h} \\
= (1-tanh^2(W_ix_2 + W_hh_1))\dfrac{∂W_hh_1}{∂W_h}
\end{equation}
因為 $\dfrac{∂W_hh_1}{∂W_h}$ 中的 $h_1$ 也跟 $W_h$ 有關,所以要繼續展開。
\begin{equation}
\dfrac{∂W_hh_1}{∂W_h} = \dfrac{∂(W_htanh(W_ix_1)+W_hh_0)}{∂W_h} \\
= tanh(W_ix_1)
\end{equation}
把前面幾項合起來可得 $\dfrac{∂L(y,\hat{y})}{∂W_h} = W_o(σ(W_oh_2) - y)(1-tanh^2(W_ix_2 + W_hh_1))tanh(W_ix_1)$
## 4.
### Question

### Answer
把原本的 $g_T$ 用 $g_{T-1}$ 表示
\begin{equation}
L = \sum_{i=1}^nexp(\dfrac{1}{K-1}\sum_{k\neq{y_i}}(g_{T-1}^k(x) + α_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + α_t^{y_i}f_t(x)))
\end{equation}
求 $\dfrac{∂L}{∂α}$,不過會針對 $\hat{y_i}$ 是否等於 k 而有所不同。
如果 $k\neq{\hat{y_i}}$,求
\begin{equation}
\dfrac{∂L}{∂α} = \sum_{i=1}^n\dfrac{1}{K-1}f_t(x)exp(\dfrac{1}{K-1}\sum_{k\neq{y_i}}(g_{T-1}^k(x) + α_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + α_t^{y_i}f_t(x))) = 0
\end{equation}
如果 $k=\hat{y_i}$,求
\begin{equation}
\dfrac{∂L}{∂α} = -\sum_{i=1}^nf_t(x)exp(\dfrac{1}{K-1}\sum_{k\neq{y_i}}(g_{T-1}^k(x) + α_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + α_t^{y_i}f_t(x))) = 0
\end{equation}