# Segmentation survey 本筆記主要紀錄image segmentation相關論文。 ## Fully Convolutional Networks for Sematic Segmentation (FCN)[^1] 14 Nov 2014 ### 貢獻 第一個提出全卷積網路概念,利用卷積網路保留特徵位置資訊。 利用影像分類的既有模型當作編碼器(Encoder),汲取不同層次特徵加以結合,預測分割圖。 實測AlexNet(39.8)、VGG16(56.0)、GoogLeNet(42.5) ### 架構 ![](https://i.imgur.com/ERyDSLI.png) ### 效果 * mIOU:62.2 PASCAL VOC 2012 * mIOU:34.0 NYUDv2 * mIOU:39.5 SIFT Flow ### 缺點 僅融合(+)不同層級特徵,再利用1x1卷積預測結果,無法進一步學習 ## U-Net:Convolutional Networks for Biomedical Image Segmentation[^2] 18 May 2015 ### 貢獻 更高效率運用資料,可用於資料較少的任務,例如:醫療影像等 ### 架構 ![](https://i.imgur.com/fCTKKl2.png) ### 效果 * mIOU:92.0 PhC-U373 * mIOU:77.5 DIC-HeLa ### 缺點 ## SegNet:A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation[^3] 2 Nov 2015 ### 貢獻 利用pooling indices架構將max pooling所損失的位置資訊保留,並用於upsampling ### 架構 ![](https://i.imgur.com/uPRnqGQ.png) ![](https://i.imgur.com/pLoMwSa.png) ### 效果 * mIOU:60.1 CamVid * mIOU:31.8 SUNRGB-D ### 缺點 只保留max value位置資訊,其餘皆補0,導致多數數值為0,而且沒有學習的機制 ## Multi-Scale Context Aggregation by Dilated Convolutions[^4] 23 Nov 2015 ### 貢獻 提出空洞卷機(Dilated convolution),並將空洞卷機組合成多尺度內容模組(Context module) ### 架構 ![](https://i.imgur.com/6TIG6Pk.png) ![](https://i.imgur.com/11IUCeh.png) ### 效果 * mIOU:73.9 PASCAL VOC 2012 ### 缺點 ![](https://i.imgur.com/rtdWHgV.png) 因為內容感知模組的感知範圍不夠廣泛,需要實驗確定Dilated rate ## DeepLab-Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs[^5] 2 Jun 2016 ### 貢獻 * Speed:8fps on an NVidia Titan X GPU * Accuracy: * Simplicity:串連兩個model,DCNNs+CRFs * 提出"ploy" learning policy 提出Atrous Spatial Pyramid Pooling(ASPP)架構,並結合Conditional Random Field(CRF)模組 ### 架構 ![](https://i.imgur.com/uOoJt6v.png) ![](https://i.imgur.com/vuUPpex.png) ![](https://i.imgur.com/ERCzavI.png) ![](https://i.imgur.com/3arY8WP.png) ### 效果 * mIOU:79.7 PASCAL VOC 2012 * mIOU:64.9 PASCAL-Person-Part * mIOU:70.4 Cityscapes ### 缺點 需要CRF進行後處理,無法End-to-End訓練,預測速度可能也會較差 dilated rate需要人工設定 ## RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation[^6] 20 Nov 2016 ### 貢獻 * multi-path refinement network (RefineNet) * cascaded RefineNets easy train end to end * chained residual pooling ### 架構 ![](https://i.imgur.com/IJdImPy.png) ![](https://i.imgur.com/BuGw8qB.png) ![](https://i.imgur.com/KeSCjJq.png) ### 效果 * mIOU:83.4 PASCAL VOC 2012 * mIOU:73.6 NYUDv2 * mIOU:47.3 PASCAL-Context * mIOU:45.9 SUN-RGBD * mIOU:40.7 ADE20K ### 缺點 需要輸入不同尺寸的影像(1.2x,0.6x),記憶體使用量可能會是瓶頸 ## Pyramid Scene Parsing Network[^7] 4 Dec 2016 PSPNet ### 貢獻 * pyramid scene parsing network * auxiliary loss for training * ### 架構 ![](https://i.imgur.com/xQLmDGv.png) ![](https://i.imgur.com/nUmddZt.png) ### 效果 ### 缺點 需要引入額外的loss進行訓練 ## Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network[^8] 8 Mar 2017 ### 貢獻 * 提出Global Convolutional Network(GCN) * 提出Boundary Refinement(BR) 修正邊界 ### 架構 ![](https://i.imgur.com/AY8H2tW.png) ![](https://i.imgur.com/V69KlpV.png) ![](https://i.imgur.com/WD0G41F.png) ![](https://i.imgur.com/BIFI0Dp.png) ![](https://i.imgur.com/xScrDhM.png) ### 效果 * mIOU:82.2 PASCAL VOC 2012 * mIOU:76.9 Cityscapes ### 缺點 GCN融合特徵處,僅使用加法,未使用參數學習 ## Rethinking Atrous Convolution for Semantic Image Segmentation[^9] 5 Dec 2017 ### 貢獻 Encoder 深度越深越好(ResNet50 < ResNet101) 實驗tensorflow 中的Asynchronous training ### 架構 ![](https://i.imgur.com/of6lEPI.png) ![](https://i.imgur.com/i7yOWUn.png) ![](https://i.imgur.com/JWyuE2u.png) ![](https://i.imgur.com/mVPw7da.png) ### 效果 * mIOU:85.7 PASCAL VOC 2012 * mIOU:81.3 Cityscapes ### 缺點 ## Understanding Convolution for Semantic Segmentation[^10] 27 Feb 2017 DeepLab3 ### 貢獻 提出影像分割有兩個關鍵因素 1. Fully Convolutional Network (FCN):越深越好 2. Conditional Random Fields (CRFs):擷取local和long range的特徵 3. Dilated convolution:增大感知面積 * 提出Dense Upsampling Convolution (DUC)解決採用unpooling的訓練問題 * 提出Hybrid Dilated Convolution (HDC)解決網格問題(gridding) ### 架構 ![](https://i.imgur.com/Zx7p4mn.png) ![](https://i.imgur.com/fbjIbIR.png) ### 效果 * mIOU:83.1 Pascal VOC2012 * mIOU:80.1 Cityscapes ### 缺點 ## The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation[^11] 28 Nov 2016 ### 貢獻 將DenseNets修改成影像分割架構 ### 架構 ![](https://i.imgur.com/pltRnd2.png) ![](https://i.imgur.com/23tRPWD.png) ![](https://i.imgur.com/wGskUhW.png) ![](https://i.imgur.com/6dngwjl.png) ### 效果 * mIOU:66.9 CamVid ### 缺點 如DenseNet相同,若無特殊實現方式,記憶體用量驚人 ## Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation[^12] 22 Aug 2018 ### 貢獻 將原本DeepLab架構改為Encoder-Decoder架構,並且從MobileNet 引入Depthwise separable convolution減少參數量。 ### 架構 ![](https://i.imgur.com/N8JKYKS.png) ![](https://i.imgur.com/Ya5LLlN.png) ![](https://i.imgur.com/f7nxUtm.png) ![](https://i.imgur.com/1xJn0ar.png) ### 效果 * mIOU:82.1 Cityscapes * mIOU:89.0 DeepLabv3+ (Xception-JFT) ### 缺點 ![](https://i.imgur.com/u8O6je1.png) [^1]:https://arxiv.org/abs/1411.4038 [^2]:https://arxiv.org/abs/1505.04597 [^3]:https://arxiv.org/abs/1511.00561 [^4]:https://arxiv.org/abs/1511.07122 [^5]:https://arxiv.org/abs/1606.00915 [^6]:https://arxiv.org/abs/1611.06612 [^7]:https://arxiv.org/abs/1612.01105 [^8]:https://arxiv.org/abs/1703.02719 [^9]:https://arxiv.org/abs/1706.05587 [^10]:https://arxiv.org/abs/1702.08502 [^11]:https://arxiv.org/abs/1611.09326 [^12]:https://arxiv.org/abs/1802.02611 ###### tags: `Survey` `Segmentation`