Lecture 9: CNN Architectures

--- tags: cs231 --- # Lecture 9: CNN Architectures ## LeNet[[1]](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf) 西元1998年時，由深度學習三巨頭之一 **Yann LeCun** 所提出最早的卷積神經網絡之一 ![](https://i.imgur.com/Qtn5DS5.png) | 輸入層 | 卷積層 | 下採樣層 | 全連接層 | 輸出層 | | :---: | :---: | :----: | :-----: | :---: | | 1 | 3 | 2 | 1 | 1 | ### 各層資訊 - C1：6@5x5 stride=1 的 filter - S2：6@2x2 stride=2 的 sub-sampling - C3：16@5x5 stride=1 的 filter - S4：16@2x2 stride=2 的 sub-sampling - C5：120@5x5 stride=1 的 filter - F6：將120個輸出全連接成84個輸出 - Output：Gaussian 全連接成10個輸出（數字0~9） ### 小細節 - sub-sampling：unit的數值相加，再乘上某個係數coefficient - Activation Function：tanh - Output：ERBF（Euclidean Radial Basis Function） >> ERBF：輸入向量與參數向量之間的歐式距離 >> $y_i=\sum_j(x_j-w_{ij})^2$ ## AlexNet[[2]](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) 2012年ImageNet競賽中，以超越上一屆冠軍五個百分點的優異成績展露頭角，且遠遠超過當年的第二名。 ![](https://i.imgur.com/vEl97UF.png) | 輸入層 | 卷積層 | 下採樣層 | 全連接層 | 輸出層 | | :---: | :---: | :----: | :-----: | :---: | | 1 | 5 | 3 | 3 | 1 | ### 各層資訊 | | filter | stride | output | padding | Other | | :------: | :----: | :----: | :-------: | :-----: | :----------------: | | Input | | | 227x227x3 | | | | Conv | 11x11 | 4 | 55x55x96 | VALID | 單一GPU有48個filter | | ReLu | | | | | | | Max Pool | 3x3 | 2 | 27x27x96 | VALID | | | Norm | | | | | | | Conv | 5x5 | 1 | 27x27x256 | SAME | 單一GPU有128個filter | | ReLu | | | | | | | Max Pool | 3x3 | 2 | 13x13x256 | | | | Norm | | | | | | | Conv | 3x3 | 1 | 13x13x384 | SAME | | | ReLu | | | | | | | Conv | 3x3 | 1 | 13x13x384 | SAME | | | ReLu | | | | | | | Conv | 3x3 | 1 | 13x13x256 | SAME | | | ReLu | | | | | | | Max Pool | 3x3 | 2 | 6x6x256 | | | | FC | | | 1x1x4096 | | | | ReLu | | | | | | | DropOut | | | | | Drop rate is 0.5 | | FC | | | 1x1x4096 | | | | ReLu | | | | | | | DropOut | | | | | Drop rate is 0.5 | | FC | | | 1x1x1000 | | | ### 架構上與LeNet的差異 1. ReLU Nonlinearity 2. Local Response Normalization $$ \begin{aligned} b^{i}_{x,y}=a^{i}_{x,y} / (k+\alpha\sum^{min(N-1,i+\frac{n}{2})}_{j=max(0,i-\frac{n}{2})}(a^{j}_{x,y})^2)^\beta \\ \end{aligned} $$ $k、a、\beta$ 為可調超參數，$N$ 為通道數，$n$ 為與 $i$ 相鄰的通道個數 <center> ![](https://i.imgur.com/A4dWfIo.png)</center> >> Batch Normalization >> $\mu_c(x)=\frac{1}{NHW}\sum^N_{n=1}\sum^H_{h=1}\sum^W_{w=1}x_{nchw}$ >> $\sigma_c(x)=\sqrt{\frac{1}{NHW}\sum^N_{n=1}\sum^H_{h=1}\sum^W_{w=1}(x_{nchw}-\mu_c(x))^2+\epsilon}$ >> $x_{out}=\gamma(\frac{x-\mu_c(x)}{\sigma_c(x)})+\beta$ >> 樣本Ｎ，通道數Ｃ，高度Ｈ，寬度Ｗ 3. Dropout 4. Data Augmentation - 圖像平移、翻轉 - 隨機裁切 224 x 224 的影像 - 利用PCA改變通道的強度（Fancy PCA） - $[p_1,p_2,p_3][\alpha_1\lambda_1,\alpha_2\lambda_2,\alpha_3\lambda_3]^T$ - Demo Time 5. Multiple GPU ### 問題 AlexNet 使用許多 data augmentation技術，模型建置時是否皆要嘗試使用data augmentation？還是訓練資料量不夠才使用此技術？ - 增加資料量 - 讓模型更加 robust - 降低模型對參數的敏感度 ## VGG[[3]](https://arxiv.org/pdf/1409.1556.pdf) - Oxford的Visual Geometry Group - 2014年ILSVRC比賽亞軍。 - 加深模型，驗證了加深網路可以提升網路的性能。 ![](https://i.imgur.com/mnZC6qN.png) ### 各層資訊 ![](https://i.imgur.com/2mlonj6.png) >> 激活函數：ReLu ## Google Inception Net[[4]](https://arxiv.org/pdf/1409.4842.pdf) - 2014年ILSVRC比賽冠軍。 - 專注於研究如何建立更深的網路。 ### Inception module ![](https://i.imgur.com/6Lh36BG.png) ### 各層資訊 ![](https://i.imgur.com/zzTEUyM.png) >> 為了避免梯度消失，網路額外增加了2個輔助的 softmax（輔助分類器），並按一個較小的權重加到最終分類結果中（這樣相當於做了模型融合）。 ### 後續演化 #### Inception V2 - 利用兩個3x3的filter取代一個5x5的filter <center class="half"> <img src="https://i.imgur.com/NL9jcHC.png" style="zoom:85%" /> ⟹ <img src="https://i.imgur.com/qinAoIp.png" style="zoom:85%" /> </center> >> nxn 的大 filter 利用 p 個 kxk 的小filter所堆疊取代 >> n $\rightarrow$ n-k+1 $\rightarrow$ n-2k+2 $\rightarrow$ ... $\rightarrow$ 1 >> $\Rightarrow$ n - pk + p = 1 >> $\Rightarrow$ n = 1 + p(k-1) - 引入Batch Normalization #### Inception V3 - 將一個較大的二維卷積拆成兩個較小的一維卷積 - 將nxn卷積拆成1xn卷積和nx1卷積 <center> ![](https://i.imgur.com/6CAyAWQ.png)</center> #### Inception V4 - 結合我大ResNet - 又稱Inception-ResNet ### 補充連結 [v2] [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](http://arxiv.org/abs/1502.03167) [v3] [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) [v4] [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, 3.08% test error](http://arxiv.org/abs/1602.07261) ### 問題 p.63 寫到 No FC layers，但明明架構裡面就有 FC？ - No FC layers 應該改為 No FC layers besides FC 1000 to output classes ## ResNet[[5]](https://arxiv.org/pdf/1512.03385.pdf) 跨連模型，解決加深網路時的退化問題。 ![](https://i.imgur.com/KdqLdmQ.png) - 恆等映射 $H(x)=x$ - 增加網絡層數，訓練誤差不增加 ![](https://i.imgur.com/4s96aCB.png) ![](https://i.imgur.com/Deiyr1I.png) ### 後續演化 #### ResNet V2 ![](https://i.imgur.com/mYzcJlN.png) - pre-activation ### Inception-ResNet[[5]](https://arxiv.org/pdf/1602.07261.pdf) ![](https://i.imgur.com/VVXHMSb.png) ### 問題除了 AlexNet 有用到 BN 之外，VGG、GoogleLeNet、ResNet… 好像都沒有用到 BN？而且好像都沒有提到是用哪個 activation function？ - AlexNet不是用BN，而且BN是2015年才被提出的，Inception V2就有使用 - Conv系列的模型沒有特別提的話，大部分都是使用ReLu 了解這些架構後，請問我們要如何應用在實際的情況？(是自己建構這些神經網路，還是有實際應用的code可以參考？) - 以目前的狀況來說，需要大型網路架構的問題，還是用使用現有的Net --- ## 問題 | 組別 |<center> 問題 </center>| |:-------------------:|----------------------| | 第一組 |1. Bottleneck Layer 的 filter 數量是如何決定的?| | 第二組 |1. AlexNet 使用許多 data augmentation技術，模型建置時是否皆要嘗試使用data augmentation?還是訓練資料量不夠才使用此技術？ 2. 除了 AlexNet 有用到 BN 之外，AGG, GoogleLeNet, ResNet.. 好像都沒有用到 BN？而且好像都沒有提到是用哪個 activation function？ 3. p.63 寫到 No FC layers，但明明架構裡面就有 FC？ | | 第三組 |1.了解這些架構後，請問我們要如何應用在實際的情況？(是自己建構這些神經網路，還是有實際應用的code可以參考？)2. 可以再解釋一下為什麼VGGNet要用較小的fliters嗎，然後他說的3x3, 5x5, 7x7那段話是什麼意思？（大概是17min半的時候）| | 第四組 |<center>報告組</center> |