--- title: CNN Architecture # 簡報的名稱 tags: Meeting # 簡報的標籤 slideOptions: # 簡報相關的設定 theme: midnight slideNumber: true --- # CNN Architecture --- ## 示意圖 **卷積層** Normal: ![](https://i.imgur.com/4RwARL7.gif =280x) With padding: ![](https://github.com/vdumoulin/conv_arithmetic/raw/master/gif/padding_strides.gif =280x) --- **Maxpooling** ![](https://i1.kknews.cc/SIG=38eb8nk/2s850000ns51o64s140o.jpg =800x) (來源:cs231n) --- **全連接層** ![](https://ver217-1253339008.cos.ap-shanghai.myqcloud.com/blog-img/fc/mlp.png ) [playground](https://reurl.cc/5goWoM) --- ## LeNet架構 (1994) ![https://hackernoon.com/visualizing-parts-of-convolutional-neural-networks-using-keras-and-cats-5cc01b214e59](https://hackernoon.com/hn-images/1*8Ut7fQHswfO2zZngh6BYfg.png =1000x) --- ## VGGNet > [K Simonyan et. al 2014] [paper](https://arxiv.org/abs/1409.1556) 主要差異: $3\times (3^2\times C^2)$ vs. $7^2\times C^2$ ![](https://i.imgur.com/w8H4Dwj.png =x650) --- ## Inception > [C Szegedy et. al 2014] [paper](https://arxiv.org/abs/1409.4842) ```graphviz digraph { node[shape=box] "previous layer"-> "1x1 convolution"->"Filter concatnation"; "previous layer" -> "1x1convolution"->"3x3 convolutions"->"Filter concatnation"; "previous layer" -> "1x1 convolution"->"5x5 convolutions"->"Filter concatnation"; "previous layer" -> "MaxPooling"->"1x1 convolution "->"Filter concatnation"; } ``` [GoogLeNet結構圖](https://i.imgur.com/rBIXwcL.jpg) ---- #### 1x1的捲積核有什麼用呢? * 減少維度 * Ex:上一層的輸出為100x100x128,經過具有256個通道的5x5卷積層之後(stride=1,pad=2),輸出數據為100x100x256,其中,卷積層的參數為128x5x5x256= 819200。 * 而假如上一層輸出先經過具有32個通道的1x1卷積層,再經過具有256個輸出的5x5卷積層,那麼輸出數據仍為為100x100x256,但卷積參數量已經減少為128x1x1x32 + 32x5x5x256= 204800,大約減少了4倍。 --- ## ResNet > [He et. al 2015 ] [paper](https://arxiv.org/pdf/1512.03385.pdf) ![](https://i.imgur.com/j0PcoPP.png) [ResNet結構圖](https://i.imgur.com/AG6eRti.png) --- ## DenseNet > G Huang et. al [paper](https://arxiv.org/abs/1608.06993) --- ![](https://cloud.githubusercontent.com/assets/8370623/17981494/f838717a-6ad1-11e6-9391-f0906c80bc1d.jpg) --- 首先假設一張圖$x_0$進入類神經網路 網路有L層, 每層都是一個非線性轉換$H_l(\cdot)$,下標$l$代表第幾層。 H是多個函數的綜合體(BN,Conv,Relu,Pool)。$x_l$表示$l^{th}$的output。 --- * 傳統網路: 使用$l^{th}$的output作為$(l+1^{th})$的input, $x_l=H_l(x_{l-1})$ * ResNets: $x_l=H_l(x_{l-1})+x_{l-1}$ * DenseNets: $x_l=H_l([x_0,x_1,\dots ,x_{l-1}])$,也就是從頭開始的每個特徵圖$x_l$都送入$H_l$並將結果串聯 --- ## Mobilnet > [AG Howard et. al 2017] [paper](https://arxiv.org/abs/1704.04861) * Depth-wise Seperable Convolution ![](https://i.imgur.com/L47Xomn.png =380x) 計算量:$D_k\times D_k\times M\times D_F\times D_F+ M \times N \times D_F \times D_F$ --- * Width Multiplier $\alpha$: Thinner Models 用於控制輸入與輸出的通道數,算量如下 $D_k\times D_k\times\alpha M \times D_F \times D_F+\alpha M \times \alpha N \times D_F \times D_F$ * Resolution Multiplier $\rho$: Reduced Representation 用於控制輸入的解析度,算量如下 $D_k\times D_k\times\alpha M \times \rho D_F \times \rho D_F+$ $\alpha M \times \alpha N \times \rho D_F \times \rho D_F$ * Relu6: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg =200x)](https://colab.research.google.com/drive/10pYIgS-_XUFpeWf9MMSqEGYUrrodPfR2) --- 原本卷積:$D_k\times D_k\times M \times N \times D_F \times D_F$ 使用DWSC算量差距: $\dfrac{1}{N}+\dfrac{1}{D_k^2}$ 使用$\alpha$算量差距: $\dfrac{\alpha}{N}+\dfrac{\alpha ^2}{D_k^2}$ 使用$\rho$算量差距: $\dfrac{\rho}{N}+\dfrac{\rho ^2}{D_k^2}$ [MobileNet結構圖](https://i.imgur.com/MkMJnph.png) ---- ![](https://img-blog.csdnimg.cn/20181220125832456.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3UwMTE5NzQ2Mzk=,size_16,color_FFFFFF,t_70) ---- ![](https://img-blog.csdnimg.cn/20181220125858427.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3UwMTE5NzQ2Mzk=,size_16,color_FFFFFF,t_70) --- | Model| Size | Top-1 Accuracy |Top-5 Accuracy|Parameters| | -------- | -------- | -------- |--------|-------| | VGG16 | 528 MB | 0.713 |0.901|138,357,544 | ResNet50 | 98 MB | 0.749 |0.921|25,636,712 |InceptionV3|92 MB |0.779|0.937|23,851,784 | MobileNet | 16 MB | 0.704 |0.895|4,253,864 --- Top-1 Acc: 模型只給出一個正確答案 Top-5 Acc: 模型給出的五個答案內有包含正確答案 人為給定的label不一定那麼準確,故top-5也是一個指標 例如:[巴哥犬與鬥牛犬](https://i2.kknews.cc/SIG=2vmjkb0/ctp-vzntr/1539256797239r0r573qo21.jpg) > 資料來自:[Keras 文檔](https://keras.io/applications/) ---- # ACCURACY **錯誤率(Error)**=$\dfrac{a}{m}$ m為樣本數,a為分類錯誤的樣本數 **準確度(Accuracy)**=$1-Error$ ---- $Acc=\dfrac{TP+TN}{TP+TN+FP+FN}$