NTU 機器學習 HW4 === ## 1. ### Question ![](https://i.imgur.com/7t7CfKG.png) ### Answer Document 的生成機率 \begin{equation} p(w) = p(w_1)^{c(w_1)}p(w_2)^{c(w_2)}...p(w_n)^{c(w_n)} \end{equation} 對兩邊取 log \begin{equation} logp(w) = c(w_1)logp(w_1) + c(w_2)logp(w_2) + ... + c(w_n)logp(w_n) \end{equation} 目標是最大化 p(w),用 Lagrange Multiplier 創出一個等價函數 \begin{equation} L = \sum_{i=1}^{n}c(w_i)logp(w_i) + λ(\sum_{i=1}^{n}p(w_i) - 1) \end{equation} L 對 $p(w_j)$ 取偏微分 \begin{equation} \dfrac{∂L}{∂p(w_j)} = \dfrac{c(w_j)}{p(w_j)} + λ = 0 \\ p(w_j) = -\dfrac{c(w_j)}{λ} \\ \sum_{i=1}^np(w_i) = \sum_{i=1}^n-\dfrac{c(w_i)}{λ} = 1 \\ λ = \sum_{i=1}^n{c(w_i)} \end{equation} 因為 $p(w_j)=-\dfrac{c(w_j)}{λ}$,所以 $p(w_i)=\dfrac{c(w_j)}{\sum_{i=1}^nc(w_i)}$ ## 2. ### Question ![](https://i.imgur.com/L1wU7dB.png) ### Answer #### t=1 \begin{equation} z = 0 * 0 + 0 * 1 + 0 * 0 + 1 * 3 + 0 = 3 \\ z_i = 100 * 0 + 100 * 1 + 0 * 0 + 0 * 3 - 10 = 90 \\ z_f = -100 * 0 + -100 * 1 + 0 * 0 + 0 * 3 + 110 = 10 \\ z_o = 0 * 0 + 0 * 1 + 100 * 0 + 0 * 3 - 10 = -10 \\ f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\ f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\ f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 0 \\ c' = 1 * 3 + 0 * 1 = 3 \\ y = 0 * 3 = 0 \end{equation} #### t=2 \begin{equation} z = 0 * 1 + 0 * 0 + 0 * 1 + 1 * -2 + 0 = -2 \\ z_i = 100 * 1 + 100 * 0 + 0 * 1 + 0 * -2 - 10 = 90 \\ z_f = -100 * 1 + -100 * 0 + 0 * 1 + 0 * -2 + 110 = 10 \\ z_o = 0 * 1 + 0 * 0 + 100 * 1 + 0 * -2 - 10 = 90 \\ f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\ f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\ f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\ c' = 1 * -2 + 3 * 1 = 1 \\ y = 1 * 1 = 1 \end{equation} #### t=3 \begin{equation} z = 0 * 1 + 0 * 1 + 0 * 1 + 1 * 4 + 0 = 4 \\ z_i = 100 * 1 + 100 * 1 + 0 * 1 + 0 * 4 - 10 = 190 \\ z_f = -100 * 1 + -100 * 1 + 0 * 1 + 0 * 4 + 110 = -90 \\ z_o = 0 * 1 + 0 * 1 + 100 * 1 + 0 * 4 - 10 = 90 \\ f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\ f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 0 \\ f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\ c' = 1 * 4 + 1 * 0 = 4 \\ y = 1 * 4 = 4 \end{equation} #### t=4 \begin{equation} z = 0 * 0 + 0 * 1 + 0 * 1 + 1 * 0 + 0 = 0 \\ z_i = 100 * 0 + 100 * 1 + 0 * 1 + 0 * 0 - 10 = 90 \\ z_f = -100 * 0 + -100 * 1 + 0 * 1 + 0 * 0 + 110 = 10 \\ z_o = 0 * 0 + 0 * 1 + 100 * 1 + 0 * 0 - 10 = 90 \\ f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\ f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\ f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\ c' = 1 * 0 + 4 * 1 = 4 \\ y = 1 * 4 = 4 \end{equation} #### t=5 \begin{equation} z = 0 * 0 + 0 * 1 + 0 * 0 + 1 * 2 + 0 = 2 \\ z_i = 100 * 0 + 100 * 1 + 0 * 0 + 0 * 2 - 10 = 90 \\ z_f = -100 * 0 + -100 * 1 + 0 * 0 + 0 * 2 + 110 = 10 \\ z_o = 0 * 0 + 0 * 1 + 100 * 0 + 0 * 2 - 10 = -10 \\ f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\ f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\ f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 0 \\ c' = 1 * 2 + 4 * 1 = 6 \\ y = 0 * 6 = 0 \end{equation} #### t=6 \begin{equation} z = 0 * 0 + 0 * 0 + 0 * 1 + 1 * -4 + 0 = -4 \\ z_i = 100 * 0 + 100 * 0 + 0 * 1 + 0 * -4 - 10 = -10 \\ z_f = -100 * 0 + -100 * 0 + 0 * 1 + 0 * -4 + 110 = 110 \\ z_o = 0 * 0 + 0 * 0 + 100 * 1 + 0 * -4 - 10 = 90 \\ f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 0 \\ f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\ f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\ c' = 0 * -4 + 6 * 1 = 6 \\ y = 1 * 6 = 6 \end{equation} #### t=7 \begin{equation} z = 0 * 1 + 0 * 1 + 0 * 1 + 1 * 1 + 0 = 1 \\ z_i = 100 * 1 + 100 * 1 + 0 * 1 + 0 * 1 - 10 = 190 \\ z_f = -100 * 1 + -100 * 1 + 0 * 1 + 0 * 1 + 110 = -90 \\ z_o = 0 * 1 + 0 * 1 + 100 * 1 + 0 * 1 - 10 = 90 \\ f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\ f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 0 \\ f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\ c' = 1 * 1 + 6 * 0 = 1 \\ y = 1 * 1 = 1 \end{equation} #### t=8 \begin{equation} z = 0 * 1 + 0 * 0 + 0 * 1 + 1 * 2 + 0 = 2 \\ z_i = 100 * 1 + 100 * 0 + 0 * 1 + 0 * 2 - 10 = 90 \\ z_f = -100 * 1 + -100 * 0 + 0 * 1 + 0 * 2 + 110 = 10 \\ z_o = 0 * 1 + 0 * 0 + 100 * 1 + 0 * 2 - 10 = 90 \\ f(z_i) = \dfrac{1}{1 + e^{-z_i}} ≈ 1 \\ f(z_f) = \dfrac{1}{1 + e^{-z_f}} ≈ 1 \\ f(z_o) = \dfrac{1}{1 + e^{-z_o}} ≈ 1 \\ c' = 1 * 2 + 1 * 1 = 3 \\ y = 1 * 3 = 3 \end{equation} ## 3. ### Question ![](https://i.imgur.com/hyRmpXx.png) ### Answer 設 $W_oh_2 = z$ #### 求 $\dfrac{∂L(y,\hat{y})}{∂W_o}$ \begin{equation} \dfrac{∂L(y,\hat{y})}{∂W_o} = \dfrac{∂L(y,\hat{y})}{∂z_j}\dfrac{∂z_j}{∂W_o} \\ = -\dfrac{∂\sum_{i}y_ilog(σ(W_oh_2)_j)}{∂z_j} * h_2 \\ = (\sum_{i=1}y_iσ(W_oh_2)_j - y_j) * h_2 \\ = (σ(W_oh_2) - y) * h_2 \end{equation} #### 求 $\dfrac{∂L(y,\hat{y})}{∂W_i}$ \begin{equation} \dfrac{∂L(y,\hat{y})}{∂W_i} = \dfrac{∂L(y,\hat{y})}{∂z}\dfrac{∂z}{∂h_2}\dfrac{∂h_2}{∂w_i} \end{equation} 代入上面的結果,$\dfrac{∂L(y,\hat{y})}{∂z}=(σ(W_oh_2) - y)$,$\dfrac{∂z}{∂h_2} = W_o$,所以要算的只有 $\dfrac{∂h_2}{∂w_i}$。 \begin{equation} \dfrac{∂h_2}{∂w_i} = \dfrac{∂tanh(W_ix_2 + W_hh_1)}{∂W_i} \\ = x_2(1-tanh^2(W_ix_2 + W_hh_1)) \end{equation} 結合前面三項可得 $\dfrac{∂L(y,\hat{y})}{∂W_i}=x_2W_o(σ(W_oh_2) - y)(1-tanh^2(W_ix_2 + W_hh_1))$ #### 求 $\dfrac{∂L(y,\hat{y})}{∂W_h}$ \begin{equation} \dfrac{∂L(y,\hat{y})}{∂W_h} = \dfrac{∂L(y,\hat{y})}{∂z}\dfrac{∂z}{∂h_2}\dfrac{∂h_2}{∂w_h} \end{equation} 代入上面的結果,$\dfrac{∂L(y,\hat{y})}{∂z}=(σ(W_oh_2) - y)$,$\dfrac{∂z}{∂h_2} = W_o$,所以要算的只有 $\dfrac{∂h_2}{∂w_h}$。 \begin{equation} \dfrac{∂h_2}{∂w_h} = \dfrac{∂tanh(W_ix_2 + W_hh_1)}{∂W_h} \\ = (1-tanh^2(W_ix_2 + W_hh_1))\dfrac{∂W_hh_1}{∂W_h} \end{equation} 因為 $\dfrac{∂W_hh_1}{∂W_h}$ 中的 $h_1$ 也跟 $W_h$ 有關,所以要繼續展開。 \begin{equation} \dfrac{∂W_hh_1}{∂W_h} = \dfrac{∂(W_htanh(W_ix_1)+W_hh_0)}{∂W_h} \\ = tanh(W_ix_1) \end{equation} 把前面幾項合起來可得 $\dfrac{∂L(y,\hat{y})}{∂W_h} = W_o(σ(W_oh_2) - y)(1-tanh^2(W_ix_2 + W_hh_1))tanh(W_ix_1)$ ## 4. ### Question ![](https://i.imgur.com/DgJ4kK1.png) ### Answer 把原本的 $g_T$ 用 $g_{T-1}$ 表示 \begin{equation} L = \sum_{i=1}^nexp(\dfrac{1}{K-1}\sum_{k\neq{y_i}}(g_{T-1}^k(x) + α_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + α_t^{y_i}f_t(x))) \end{equation} 求 $\dfrac{∂L}{∂α}$,不過會針對 $\hat{y_i}$ 是否等於 k 而有所不同。 如果 $k\neq{\hat{y_i}}$,求 \begin{equation} \dfrac{∂L}{∂α} = \sum_{i=1}^n\dfrac{1}{K-1}f_t(x)exp(\dfrac{1}{K-1}\sum_{k\neq{y_i}}(g_{T-1}^k(x) + α_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + α_t^{y_i}f_t(x))) = 0 \end{equation} 如果 $k=\hat{y_i}$,求 \begin{equation} \dfrac{∂L}{∂α} = -\sum_{i=1}^nf_t(x)exp(\dfrac{1}{K-1}\sum_{k\neq{y_i}}(g_{T-1}^k(x) + α_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + α_t^{y_i}f_t(x))) = 0 \end{equation}