ML2019FALL HW3

--- title: 'ML2019FALL HW3' disqus: hackmd --- ## HW3 - Handwritten Assignment ### Convolution (1%) As we mentioned in class, image size may change after convolution layers. Consider a batch of image data with shape $(B, W, H, input\_channels)$, how will the shape change after the convolution layer? $Conv2D\ ( input\_channels,\ output\_channels,\ kernel\_size=(k_1,\ k_2),\\ \qquad \qquad stride=(s_1,\ s_2),\ padding=(p_1,\ p_2))$ To simplify the answer: the padding tuple means that we pad $p_1$ pixels on both left and right side, and $p_2$ pixels for top and bottom Sol: $(B, W', H', output_channels)$ $W' = \lfloor\frac{W\ +\ 2*p_{1}\ -\ k_{1}}{s_{1}}+1\rfloor$ $H' = \lfloor\frac{H\ +\ 2*p_{2}\ -\ k_{2}}{s_{2}}+1\rfloor$ ### Batch Normalization (1%) Besides ***Dropout***, we usualy use ***Batch Normalization*** in training nowadays [\[ref\]](https://arxiv.org/pdf/1502.03167.pdf). The trick is popular whithin the deep networks due to its convenience while training. It preverses the distribution within hidden layers and avoids gradient vanish. The alogrithm can be written as below: $Input:\ values\ of\ x\ over\ a\ mini-batch:\ B=\{x_{1..m}\};$ $Output: {y_i = BN_{\gamma, \beta}(x_i)}$ $Parameters\ to\ be\ learned: \gamma ,\ \beta$ $\mu _{B} \leftarrow \ \frac{1}{m} \sum^{m}_{i=1}x_i \qquad \qquad \ \ \ // mini-batch\ mean$ $\sigma ^2_B \leftarrow \ \frac{1}{m} \sum^{m}_{i=1}(x_i-\mu _B)^2 \quad \ \ \ \ // mini-batch\ variance$ $\hat{x_i} \leftarrow \frac{x_i-\mu_B}{\sqrt{\sigma_{B}^{2}+\epsilon}} \qquad \qquad \qquad \ \ //normalize$ $y_i \leftarrow \gamma \hat{x_i}\ +\ \beta \equiv BN_{\gamma , \beta}(x_i) \quad //scale\ and\ shift$ How to update $\gamma$ and $\beta$ from the optimization process of loss? Just try to derive $\frac{\partial l}{\partial \hat{x_i}}$, $\frac{\partial l}{\partial \sigma^2_B}$, $\frac{\partial l}{\partial \mu_B}$, $\frac{\partial l}{\partial x_i}$, $\frac{\partial l}{\partial \gamma}$, $\frac{\partial l}{\partial \beta}$ Sol: $\frac{\partial l}{\partial \hat{x_{i}}} = \frac{\partial l}{\partial y_i} \gamma$ $\frac{\partial l}{\partial \sigma _{B}^{2}} = \sum_{i=1}^{m}\frac{\partial l}{\partial \hat{x_{i}}}*(x_i - \mu_B)*\frac{-1}{2} (\sigma _{B}^{2}+\epsilon )^{-3/2}$ $\frac{\partial l}{\partial \mu_B} = (\sum_{i=1}^{m}\frac{\partial l}{\partial \hat{x_{i}}}*\frac{-1}{\sqrt{\sigma _{B}^{2}+\epsilon }}) + \frac{\partial l}{\partial \sigma _{B}^{2}} \frac{\sum_{i=1}^{m}-2(x_i - \mu_B)}{m}$ $\frac{\partial l}{\partial x_{i}} = \frac{\partial l}{\partial \hat{x_{i}}}*\frac{-1}{\sqrt{\sigma _{B}^{2}+\epsilon }} + \frac{\partial l}{\partial \sigma _{B}^{2}}*\frac{2(x_i - \mu_B)}{m} + \frac{\partial l}{\partial \mu_B}*\frac{1}{m}$ $\frac{\partial l}{\partial \gamma} = \sum^{m}_{i=1} \frac{\partial l}{\partial y_i} *\hat{x_i}$ $\frac{\partial l}{\partial \beta} = \sum^{m}_{i=1} \frac{\partial l}{\partial y_i}$ ### Softmax and Cross Entropy (1%) In classification problem, we use softmax as activation function and cross entropy as loss function. $softmax(z_t) = \frac{e^{z_t}}{\sum_{i}e^{z_i}}$ $cross\_entropy = L(y, \hat{y}) = -\sum_{i}y_ilog\hat{y_i}$ $cross\_entropy = L_t(y_t, \hat{y_t}) = -y_tlog\hat{y_t}$ $\hat{y_t} = softmax(z_t)$ Derive that $\frac{\partial L_t}{\partial z_t} = \hat{y_t} - y_t$ Sol: In binary case ($y_t$ = 1) : $\frac{\partial L_t}{\partial z_t} = -\frac{\partial y_tlog\hat{y_t}}{\partial z_t} = -y_t\frac{\partial log\hat{y_t}}{\partial z_t} = -y_t \frac{1}{\hat{y_t}} \frac{\partial \hat{y_t}}{\partial z_t} = -y_t\frac{1}{\hat{y_t}} (\hat{y_t} - \hat{y_t}^2) = y_t \hat{y_t} - y_t$ similar for $y_t$ = 0

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.