HackMD
  • Beta
    Beta  Get a sneak peek of HackMD’s new design
    Turn on the feature preview and give us feedback.
    Go → Got it
      • Create new note
      • Create a note from template
    • Beta  Get a sneak peek of HackMD’s new design
      Beta  Get a sneak peek of HackMD’s new design
      Turn on the feature preview and give us feedback.
      Go → Got it
      • Sharing Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • More (Comment, Invitee)
      • Publishing
        Please check the box to agree to the Community Guidelines.
        Everyone on the web can find and read all notes of this public team.
        After the note is published, everyone on the web can find and read this note.
        See all published notes on profile page.
      • Commenting Enable
        Disabled Forbidden Owners Signed-in users Everyone
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
        • Everyone
      • Invitee
      • No invitee
      • Options
      • Versions and GitHub Sync
      • Transfer ownership
      • Delete this note
      • Template
      • Save as template
      • Insert from template
      • Export
      • Dropbox
      • Google Drive Export to Google Drive
      • Gist
      • Import
      • Dropbox
      • Google Drive Import from Google Drive
      • Gist
      • Clipboard
      • Download
      • Markdown
      • HTML
      • Raw HTML
    Menu Sharing Create Help
    Create Create new note Create a note from template
    Menu
    Options
    Versions and GitHub Sync Transfer ownership Delete this note
    Export
    Dropbox Google Drive Export to Google Drive Gist
    Import
    Dropbox Google Drive Import from Google Drive Gist Clipboard
    Download
    Markdown HTML Raw HTML
    Back
    Sharing
    Sharing Link copied
    /edit
    View mode
    • Edit mode
    • View mode
    • Book mode
    • Slide mode
    Edit mode View mode Book mode Slide mode
    Note Permission
    Read
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    Write
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    More (Comment, Invitee)
    Publishing
    Please check the box to agree to the Community Guidelines.
    Everyone on the web can find and read all notes of this public team.
    After the note is published, everyone on the web can find and read this note.
    See all published notes on profile page.
    More (Comment, Invitee)
    Commenting Enable
    Disabled Forbidden Owners Signed-in users Everyone
    Permission
    Owners
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Invitee
    No invitee
       owned this note    owned this note      
    Published Linked with GitHub
    Like BookmarkBookmarked
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # MobileNet V1~V3 ###### tags: `paper notes` `deep learning` [V1 paper link](https://arxiv.org/pdf/1704.04861.pdf) [V1 Code](https://github.com/shanglianlm0525/PyTorch-Networks/blob/master/Lightweight/MobileNetV1.py) [V2 paper link](https://arxiv.org/pdf/1801.04381.pdf) [V2 Code](https://github.com/shanglianlm0525/PyTorch-Networks/blob/master/Lightweight/MobileNetV2.py) [V3 paper link](https://arxiv.org/pdf/1905.02244.pdf) [V3 Code](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/mobilenetv3.py) ## What is MobileNet? - Google於2017年提出的輕量級 CNN 圖片分類模型,主要使用在手機和嵌入式裝置上 ![](https://i.imgur.com/FPIVZQb.jpg) - 並不是唯一一種輕量級模型,其他還有像是 Squeezenet 或是Shufflenet 等等模型 - 我手上這台 [Pixel 4](https://ai.googleblog.com/2019/11/introducing-next-generation-on-device.html) 裡面的 Pixel Neural Core 就是用了 MobileNetV3 加上利用 autoML 在 TPU 上優化而生的 MobileNetEdgeTPU - 順帶一提最新的 [Pixel 6](https://ai.googleblog.com/2021/11/improved-on-device-ml-on-pixel-6-with.html) 搭載的是 MobileNetEdgeTPUV2 (圖片分類)、SpaghettiNet-EdgeTPU (物件偵測)、FaceSSD (人臉辨識)、MobileBERT (NLP) ![](https://i.imgur.com/E4sMnjn.jpg) ## MobileNet V1: Depthwise Separable Convolution - Depthwise Separable Conv == Depthwise Conv + Pointwise Conv - Depthwise Conv 將每個 channel 分開做 Conv 來降低計算量,而 Pointwise Conv 則用來來學習同一張圖不同 channel 之間的關係 - Standard Conv 和 Depthwise Conv 差別在於 channel ![](https://i.imgur.com/ook6xdF.jpg) - Pointwise Conv 就是 1x1 Conv,針對一張圖片 1x1 區域的所有 channel 做 Conv ![](https://i.imgur.com/2RkXLbO.jpg) - 順帶一提 MobileNet 並不是第一個提出 Separable Convolution,這個作法早在 [Simplifying ConvNets for Fast Learning, 2012](https://liris.cnrs.fr/Documents/Liris-5659.pdf) 就已經出現,並且在 [Rigid-Motion Scattering For Image Classification, 2013](https://www.di.ens.fr/data/publications/papers/phd_sifre.pdf)中被推廣到 Depthwise Separable ### 將 channel 分開做 Conv 為什麼可以降低參數量? 符號定義 $D_k:$ Kernel 的 width and height $D_F:$ feature map 的 width and height $M:$ Input Channel 數量 $N:$ Output Channel 數量 (Kernel 數量) - Standard Conv 的計算成本: ![](https://i.imgur.com/4yYEuSs.jpg) - Depthwise Conv 的計算成本: ![](https://i.imgur.com/6XrG95F.jpg) - Depthwise Separable Conv 的計算成本 ![](https://i.imgur.com/bXIOG7Y.jpg) - 拆分 ![](https://i.imgur.com/6rm1wdl.jpg) ### 少了多少計算量? ![](https://i.imgur.com/pBCBRsK.jpg) ### 架構對比 - Standard Conv vs Depthwise Separable Conv ![](https://i.imgur.com/S2iNLTK.jpg) ![](https://i.imgur.com/tZQp62o.jpg) - V1 完整架構 ![](https://i.imgur.com/qkx2Zoo.png) ### V1 result - 展示了 mobileNet 在使用較少計算複雜度和模型大小的情況下,表現能與其他較大的模型差不多 ![](https://i.imgur.com/N3IbMXh.png) - 作者也有統計每個 component 所佔的計算量比例,可見 1x1 Conv 所佔比例最高 ![](https://i.imgur.com/Ht6uEr0.png) ## MobileNet V2: Inverted Residuals and Linear Bottlenecks - 作者發現 V1 的 Depth-wise separable convolution 中有許多空的 Conv kernel,並發現原因是在低維度空間做 ReLU 會失去很多資訊,但在高維度空間裡面做卻不會 - 因為 ReLU 的 dead relu problem ![](https://i.imgur.com/HChl8FX.png) - 因此 V2 在 V1 的 Depth-wise separable convolution 的基礎上增加了 Linear Bottlenecks,就是在把做 ReLU 之前的輸入維度提高並換掉 ReLU ![](https://i.imgur.com/UeNtzOJ.jpg) ### Linear Bottlenacks - 把 Point-wise Conv 中的 ReLU 換成 Linear function ![](https://i.imgur.com/cYyPTMI.png) - V2 也在 Depth-wise Conv 之前先做 Point-wise Conv(1x1 conv) 來做升維度,好讓其提取到更多特徵,原文稱這個為 expansion layer ![](https://i.imgur.com/IFEcriu.jpg) ### Inverted Residuals - 最近很紅的 [ConvNeXt, 2020s](https://github.com/facebookresearch/ConvNeXt) 採用此設計來降低運算量 - 這裡加入了 residual connection 的概念,來達到更高的 memory efficient - 注意到 classicial residual block 在連接的時候 channel 很多,但 inverted residual 只連接了 bottlenneck - 有標註對角線的 layer 不使用非線性層 ![](https://i.imgur.com/7g0B0SS.png) ```python= class InvertedResidual(nn.Module): def __init__(self, in_channels, out_channels, stride, expansion_factor=6): super(InvertedResidual, self).__init__() self.stride = stride mid_channels = (in_channels * expansion_factor) self.bottleneck = nn.Sequential( Conv1x1BNReLU(in_channels, mid_channels), Conv3x3BNReLU(mid_channels, mid_channels, stride,groups=mid_channels), Conv1x1BN(mid_channels, out_channels) ) if self.stride == 1: self.shortcut = Conv1x1BN(in_channels, out_channels) def forward(self, x): out = self.bottleneck(x) out = (out+self.shortcut(x)) if self.stride==1 else out return out ``` - 運算量的對比 ![](https://i.imgur.com/NKYsy4x.png) - MobileNet V1 vs MobileNet V2 ![](https://i.imgur.com/fSv01JY.jpg) - ResNet vs MobileNet V2 - ResNet 先降維 (0.25倍)-> Conv -> 再升維 - MobileNetV2 先升維(6倍) -> Conv -> 再降維 - 這樣設計的原因就是他們希望讓特徵能夠在高維的空間作擷取 ![](https://i.imgur.com/FRmvt12.jpg) ### 整體架構 ![](https://i.imgur.com/xtiwAmV.png) - 這邊可以得知 expansion factor $t$ =6,也就是說每次的 Point-wise Conv 會輸出 6*k 個 channel - $c$ = channel, $n$ = 重複幾次, $s$ = stride 與其他類似網路的對比 - V2在遇到 stride=2 的 3x3 Conv 的時候會取消使用 residual connection,因為輸入和輸出的尺寸會不一樣 (尺寸會減半) ![](https://i.imgur.com/MbLoXWL.png) - V2 和其他網路對比其實在中間層的時候就參數量較少了 ![](https://i.imgur.com/HYaSaxj.png) ### V2 Result - On ImageNet (Google Pixel 1) ![](https://i.imgur.com/EhS20ep.png) - On COCO - 這邊所謂的 SSDLite 指的是將 SSD 裡面的 Conv 換成 separable convolutions (depthwise followed by 1x1 projection) 的輕量變形模型![](https://i.imgur.com/1e97pa9.png) ![](https://i.imgur.com/3ACcjjQ.png) - 可見其參數量遠遠小於 YoloV2 和 SSD ## MobileNet V3: Squeeze and Excitation with NAS - 在 V3 中,除了保留前代特性以外還加入了 NAS 以及 SENet 的 Squeeze and Excitation 架構,透過 Global Average Pooling (GAP)計算每個 feature map 的權重,用來強化重要的 feature map 的影響力,並減弱不重要的 feature map 的影響力 - NAS: Neural Architecture Search,一種 Auto-ML 的方法,G社一直都很愛 ~~(因為很少人玩得起)~~ - 除此之外也把原本使用的 activation function swish 改為 h-swish 以避免計算 sigmoid,並微調了一點 V2 架構來更進一步降低計算成本 ### SENet [paper link](https://arxiv.org/pdf/1709.01507.pdf) - 主要目標是學習 feature channel 間的關係,凸顯不同 feature channel 的重要度,進而提高模型表現 - 所謂的學習是透過 attention 或是 gating 方式進行,因此實作方法並不唯一 - 可用來強化重要的 feature map 的影響力,並減弱不重要的 feature map 的影響力 ![](https://i.imgur.com/faehOlm.png) - x 為輸入,x = w * h * c1 (width * height * channel) - 透過卷積變換 $F_{tr}$, 輸出 w * h * c2 (width * height * channel),c2 個大小為w*h的feature map $u_c$ ![](https://i.imgur.com/ayiponP.png =400x150) - $v_c$ 為 c-th filter 的參數 SENet 流程: 1. 透過 $F_{sq}$ 壓縮操作,輸出 1 * 1 * c2 (Squeeze部分) - 這邊作者用global average pooling 作為 Squeeze 操作(就是把w和h維度取平均變成一個scalar),作為等等學習的準備 ![](https://i.imgur.com/JFul9qD.png) 2. 透過 $F_{ex}(W)$ 操作學到權重 (Excitation部分) - $F_{ex}(W)$操作包括兩個全連接層和兩個非線性激活函數(ReLu, Sigmoid)來製作出一個 gating 機制來學習 ![](https://i.imgur.com/vkZex16.png) 3. 最後透過 $F_{sacle}$ 輸出 re-wight 後的 w * h * c2 - $s_c$ 就是 feature map 的 weights,論文提到這樣的操作其實就等於在對每一個 feature map 學習其 self-attention weight,但沒有詳細說明怎麼替換成 SA 版本 ![](https://i.imgur.com/nYHYNtm.png) Implementation of SENet in [timm](https://github.com/rwightman/pytorch-image-models/blob/07379c6d5dbb809b3f255966295a4b03f23af843/timm/models/efficientnet_blocks.py#L17), using gating ```python= class SqueezeExcite(nn.Module): """ Squeeze-and-Excitation w/ specific features for EfficientNet/MobileNet family Args: in_chs (int): input channels to layer rd_ratio (float): ratio of squeeze reduction act_layer (nn.Module): activation layer of containing block gate_layer (Callable): attention gate function force_act_layer (nn.Module): override block's activation fn if this is set/bound rd_round_fn (Callable): specify a fn to calculate rounding of reduced chs """ def __init__( self, in_chs, rd_ratio=0.25, rd_channels=None, act_layer=nn.ReLU, gate_layer=nn.Sigmoid, force_act_layer=None, rd_round_fn=None): super(SqueezeExcite, self).__init__() if rd_channels is None: rd_round_fn = rd_round_fn or round rd_channels = rd_round_fn(in_chs * rd_ratio) act_layer = force_act_layer or act_layer self.conv_reduce = nn.Conv2d(in_chs, rd_channels, 1, bias=True) self.act1 = create_act_layer(act_layer, inplace=True) self.conv_expand = nn.Conv2d(rd_channels, in_chs, 1, bias=True) self.gate = create_act_layer(gate_layer) def forward(self, x): x_se = x.mean((2, 3), keepdim=True) x_se = self.conv_reduce(x_se) x_se = self.act1(x_se) x_se = self.conv_expand(x_se) return x * self.gate(x_se) ``` SENet 可替換掉 inception block 或是 residual block ![](https://i.imgur.com/0bwmZC7.png) MobileNet V2 ![](https://i.imgur.com/vEMBGON.png) MobileNetV2 + Squeeze-and-Excite = MobileNetV3 - 整個架構是將 SENet 放在 depthwise conv 之後,變成新的 bottleneck - 這樣放的原因是因為 SENet 的計算會花費一定時間,所以作者在含有 SENet 的架構中將 expansion layer 的 channel 變為原本的 1/4,這樣一來他們發現不僅可以提高準確度也沒有增加所需時間 ![](https://i.imgur.com/RCFdf3Y.png) ### NAS - 沒怎麼接觸所以簡單講 - 主要利用 platform-aware NAS + NetAdapt - 前者用於在計算量受限的前提下來搜尋網路的每一個 block,稱為 block-wise search ![](https://i.imgur.com/RaParbs.png) - 後者則用於針對每一個確定的 block 之中的網路層 kernel 數量做學習,稱為 layer-wise search ![](https://i.imgur.com/G2X4rxF.png) - 搜尋的目標主要是有兩個: 1) 減少任何一個 expansion layer 的 size, 2) 減少所有 block 中的 bottleneck - 在使用 NAS 的過程中他們也因此發現 V2 的某幾個層計算成本相對高,因此才會又對其架構做了進一步修改 ### 架構微調 - 他們實驗發現 V2 之中用來提高維度的 1x1 Conv (Expansion layer) 其實反而增加了模型的計算量,因此改為將其放在 avg pooling 之後 - 整個流程會先利用 avg pooling 將 feature map 從 7x7 降為 1x1,再利用 1x1 Conv 提高維度,減少了 7x7=49 倍的計算量 - 除此之外作者也去掉了前面的 3x3 Conv 和 1x1 Conv,因此降低 15ms 的速度但沒有喪失準確率 - V2 ![](https://i.imgur.com/tG8oUOc.png) - V3 - 眼睛比較利的話也可以發現 V3 還有調整一開始的 filter 數量,V2 是使用 32 個 3x3 conv kernel,而經過實驗後他們發現可以降為 16 個並且不影響準確率又可以降低 2ms ![](https://i.imgur.com/swNdrwP.jpg) - 微調後的整體架構 ![](https://i.imgur.com/qDIV2CU.png) ### Nonlinearities - 原本的 swish 中有使用到 sigmoid,但其計算放在行動裝置上面的計算非常貴 ![](https://i.imgur.com/ByUgXtT.png) - 所以他們對其做了修改,將一部分較深層的激活函數改用 h-swish,剩下的則使用 ReLU 來替代掉 swish ![](https://i.imgur.com/3413W1t.png) 使用 ReLU 的好處有兩個: 1. 可在任何平台上進行運算 2. 消除了潛在的由於浮點數運算缺陷而導致的準確度損失 ![](https://i.imgur.com/f98J0rb.png) - 比較使用 h-swish 之後所能降低的 latency (ms) - @n, n=number of channel ![](https://i.imgur.com/5CSmTUV.png) - 作者的實驗發現 h-swish 應該使用在 channel >= 80 的 layer 才能得到最好效果 ![](https://i.imgur.com/dl5uXPt.png) ### V3 Result - 整個 MobileNet V3 發展的流程與模型增進程度 ![](https://i.imgur.com/HlcyspF.png) - V1 vs V2 vs V3 in COCO ![](https://i.imgur.com/fuOQ2zJ.png) - V1 vs V2 on Pixel 1 ![](https://i.imgur.com/MuegIrN.png) - Experiments on Pixel, 2, 3 ![](https://i.imgur.com/GkkITtm.png) ![](https://i.imgur.com/tspno96.png) ## References - [轻量级神经网络“巡礼”(二)—— MobileNet,从V1到V3](https://zhuanlan.zhihu.com/p/70703846) - [卷積神經網路(Convolutional neural network, CNN): 1×1卷積計算在做什麼](https://chih-sheng-huang821.medium.com/%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-convolutional-neural-network-cnn-1-1%E5%8D%B7%E7%A9%8D%E8%A8%88%E7%AE%97%E5%9C%A8%E5%81%9A%E4%BB%80%E9%BA%BC-7d7ebfe34b8) - [深度學習-MobileNet (Depthwise separable convolution)](https://chih-sheng-huang821.medium.com/%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92-mobilenet-depthwise-separable-convolution-f1ed016b3467) - [[論文筆記] MobileNet演變史-從MobileNetV1到MobileNetV3](https://chihangchen.medium.com/%E8%AB%96%E6%96%87%E7%AD%86%E8%A8%98-mobilenetv3%E6%BC%94%E8%AE%8A%E5%8F%B2-f5de728725bc)

    Import from clipboard

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lost their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template is not available.


    Upgrade

    All
    • All
    • Team
    No template found.

    Create custom template


    Upgrade

    Delete template

    Do you really want to delete this template?

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Tutorials

    Book Mode Tutorial

    Slide Mode Tutorial

    YAML Metadata

    Contacts

    Facebook

    Twitter

    Feedback

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions

    Versions and GitHub Sync

    Sign in to link this note to GitHub Learn more
    This note is not linked with GitHub Learn more
     
    Add badge Pull Push GitHub Link Settings
    Upgrade now

    Version named by    

    More Less
    • Edit
    • Delete

    Note content is identical to the latest version.
    Compare with
      Choose a version
      No search result
      Version not found

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub

        Please sign in to GitHub and install the HackMD app on your GitHub repo. Learn more

         Sign in to GitHub

        HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Available push count

        Upgrade

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Upgrade

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully