MobileNet V1~V3
V1 paper link
V1 Code
V2 paper link
V2 Code
V3 paper link
V3 Code
What is MobileNet?
- Google於2017年提出的輕量級 CNN 圖片分類模型,主要使用在手機和嵌入式裝置上
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 並不是唯一一種輕量級模型,其他還有像是 Squeezenet 或是Shufflenet 等等模型
- 我手上這台 Pixel 4 裡面的 Pixel Neural Core 就是用了 MobileNetV3 加上利用 autoML 在 TPU 上優化而生的 MobileNetEdgeTPU
- 順帶一提最新的 Pixel 6 搭載的是 MobileNetEdgeTPUV2 (圖片分類)、SpaghettiNet-EdgeTPU (物件偵測)、FaceSSD (人臉辨識)、MobileBERT (NLP)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
MobileNet V1: Depthwise Separable Convolution
- Depthwise Separable Conv == Depthwise Conv + Pointwise Conv
- Depthwise Conv 將每個 channel 分開做 Conv 來降低計算量,而 Pointwise Conv 則用來來學習同一張圖不同 channel 之間的關係
- Standard Conv 和 Depthwise Conv 差別在於 channel
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- Pointwise Conv 就是 1x1 Conv,針對一張圖片 1x1 區域的所有 channel 做 Conv
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 順帶一提 MobileNet 並不是第一個提出 Separable Convolution,這個作法早在 Simplifying ConvNets for Fast Learning, 2012 就已經出現,並且在 Rigid-Motion Scattering For Image Classification, 2013中被推廣到 Depthwise Separable
將 channel 分開做 Conv 為什麼可以降低參數量?
符號定義
Kernel 的 width and height
feature map 的 width and height
Input Channel 數量
Output Channel 數量 (Kernel 數量)
- Standard Conv 的計算成本:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- Depthwise Conv 的計算成本:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- Depthwise Separable Conv 的計算成本
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 拆分
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
少了多少計算量?
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
架構對比
- Standard Conv vs Depthwise Separable Conv
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- V1 完整架構
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
V1 result
- 展示了 mobileNet 在使用較少計算複雜度和模型大小的情況下,表現能與其他較大的模型差不多
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 作者也有統計每個 component 所佔的計算量比例,可見 1x1 Conv 所佔比例最高
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
MobileNet V2: Inverted Residuals and Linear Bottlenecks
- 作者發現 V1 的 Depth-wise separable convolution 中有許多空的 Conv kernel,並發現原因是在低維度空間做 ReLU 會失去很多資訊,但在高維度空間裡面做卻不會
- 因為 ReLU 的 dead relu problem
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 因此 V2 在 V1 的 Depth-wise separable convolution 的基礎上增加了 Linear Bottlenecks,就是在把做 ReLU 之前的輸入維度提高並換掉 ReLU
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Linear Bottlenacks
- 把 Point-wise Conv 中的 ReLU 換成 Linear function
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- V2 也在 Depth-wise Conv 之前先做 Point-wise Conv(1x1 conv) 來做升維度,好讓其提取到更多特徵,原文稱這個為 expansion layer
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Inverted Residuals
- 最近很紅的 ConvNeXt, 2020s 採用此設計來降低運算量
- 這裡加入了 residual connection 的概念,來達到更高的 memory efficient
- 注意到 classicial residual block 在連接的時候 channel 很多,但 inverted residual 只連接了 bottlenneck
- 有標註對角線的 layer 不使用非線性層
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 運算量的對比
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- MobileNet V1 vs MobileNet V2
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- ResNet vs MobileNet V2
- ResNet 先降維 (0.25倍)-> Conv -> 再升維
- MobileNetV2 先升維(6倍) -> Conv -> 再降維
- 這樣設計的原因就是他們希望讓特徵能夠在高維的空間作擷取
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
整體架構
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 這邊可以得知 expansion factor =6,也就是說每次的 Point-wise Conv 會輸出 6*k 個 channel
- = channel, = 重複幾次, = stride
與其他類似網路的對比
-
V2在遇到 stride=2 的 3x3 Conv 的時候會取消使用 residual connection,因為輸入和輸出的尺寸會不一樣 (尺寸會減半)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
V2 和其他網路對比其實在中間層的時候就參數量較少了
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
V2 Result
- On ImageNet (Google Pixel 1)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- On COCO
- 這邊所謂的 SSDLite 指的是將 SSD 裡面的 Conv 換成 separable convolutions (depthwise followed by 1x1 projection) 的輕量變形模型
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
MobileNet V3: Squeeze and Excitation with NAS
- 在 V3 中,除了保留前代特性以外還加入了 NAS 以及 SENet 的 Squeeze and Excitation 架構,透過 Global Average Pooling (GAP)計算每個 feature map 的權重,用來強化重要的 feature map 的影響力,並減弱不重要的 feature map 的影響力
- NAS: Neural Architecture Search,一種 Auto-ML 的方法,G社一直都很愛
(因為很少人玩得起)
- 除此之外也把原本使用的 activation function swish 改為 h-swish 以避免計算 sigmoid,並微調了一點 V2 架構來更進一步降低計算成本
SENet
paper link
- 主要目標是學習 feature channel 間的關係,凸顯不同 feature channel 的重要度,進而提高模型表現
- 所謂的學習是透過 attention 或是 gating 方式進行,因此實作方法並不唯一
- 可用來強化重要的 feature map 的影響力,並減弱不重要的 feature map 的影響力

- x 為輸入,x = w * h * c1 (width * height * channel)
- 透過卷積變換 , 輸出 w * h * c2 (width * height * channel),c2 個大小為w*h的feature map

- 為 c-th filter 的參數
SENet 流程:
- 透過 壓縮操作,輸出 1 * 1 * c2 (Squeeze部分)
- 這邊作者用global average pooling 作為 Squeeze 操作(就是把w和h維度取平均變成一個scalar),作為等等學習的準備

- 透過 操作學到權重 (Excitation部分)
- 操作包括兩個全連接層和兩個非線性激活函數(ReLu, Sigmoid)來製作出一個 gating 機制來學習

3. 最後透過 輸出 re-wight 後的 w * h * c2
- 就是 feature map 的 weights,論文提到這樣的操作其實就等於在對每一個 feature map 學習其 self-attention weight,但沒有詳細說明怎麼替換成 SA 版本

Implementation of SENet in timm, using gating
SENet 可替換掉 inception block 或是 residual block

MobileNet V2

MobileNetV2 + Squeeze-and-Excite = MobileNetV3
- 整個架構是將 SENet 放在 depthwise conv 之後,變成新的 bottleneck
- 這樣放的原因是因為 SENet 的計算會花費一定時間,所以作者在含有 SENet 的架構中將 expansion layer 的 channel 變為原本的 1/4,這樣一來他們發現不僅可以提高準確度也沒有增加所需時間

NAS
- 沒怎麼接觸所以簡單講
- 主要利用 platform-aware NAS + NetAdapt
- 前者用於在計算量受限的前提下來搜尋網路的每一個 block,稱為 block-wise search

- 後者則用於針對每一個確定的 block 之中的網路層 kernel 數量做學習,稱為 layer-wise search

- 搜尋的目標主要是有兩個: 1) 減少任何一個 expansion layer 的 size, 2) 減少所有 block 中的 bottleneck
- 在使用 NAS 的過程中他們也因此發現 V2 的某幾個層計算成本相對高,因此才會又對其架構做了進一步修改
架構微調
- 他們實驗發現 V2 之中用來提高維度的 1x1 Conv (Expansion layer) 其實反而增加了模型的計算量,因此改為將其放在 avg pooling 之後
- 整個流程會先利用 avg pooling 將 feature map 從 7x7 降為 1x1,再利用 1x1 Conv 提高維度,減少了 7x7=49 倍的計算量
- 除此之外作者也去掉了前面的 3x3 Conv 和 1x1 Conv,因此降低 15ms 的速度但沒有喪失準確率
- V2

- V3
- 眼睛比較利的話也可以發現 V3 還有調整一開始的 filter 數量,V2 是使用 32 個 3x3 conv kernel,而經過實驗後他們發現可以降為 16 個並且不影響準確率又可以降低 2ms

- 微調後的整體架構

Nonlinearities
- 原本的 swish 中有使用到 sigmoid,但其計算放在行動裝置上面的計算非常貴

- 所以他們對其做了修改,將一部分較深層的激活函數改用 h-swish,剩下的則使用 ReLU 來替代掉 swish

使用 ReLU 的好處有兩個:
- 可在任何平台上進行運算
- 消除了潛在的由於浮點數運算缺陷而導致的準確度損失

- 比較使用 h-swish 之後所能降低的 latency (ms)
- @n, n=number of channel

- 作者的實驗發現 h-swish 應該使用在 channel >= 80 的 layer 才能得到最好效果

V3 Result
- 整個 MobileNet V3 發展的流程與模型增進程度

- V1 vs V2 vs V3 in COCO

- V1 vs V2 on Pixel 1

- Experiments on Pixel, 2, 3


References