2020 YOLOv4

YOLOv4 提出了 Object detection 中各個問題的分類:Input, Backbone (CNN 架構) , Neck (融合各種 scale 的方式) , Head (如何propose bounding box 的方法 與 loss function)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

而 YOLOv4 的架構選擇如下

  • Backbone: CSPDarknet53
  • Neck: SPP, PAN (PANet)
  • Head: YOLOv3

以下為網路架構:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Backbone: CSPDarknet53

我們將 CSP 架構套用到 Darknet53 上即是 CSPDarknet53。

什麼是 CSP?CSPNet (Cross-Stage Partial Network) 提出主要是為了解決三個問題:

  1. 增強 CNN 的學習能力,能夠在輕量化的同時保持準確性。
  2. 降低計算瓶頸
  3. 降低內存成本

我們將 CSP 架構套用到 DensNet 上

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

加上 CSP 架構後可以觀察到不僅參數量減少,甚至還會提高準確率。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Neck: SPP

圖(a) 是原本 Yolov2 的方式,2D feature map 被弄成 1D 的 feature,所以最後面需要接 fully-connected layers,

圖(b) 則是把他們再接起來,所以後面能繼續接 CNN,為 Yolov3 使用的方式。然後好處是你可以想像成他又多了一層這樣,如下下圖所示

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Neck: PAN (PANet)

之前有介紹過 FPN,而 PANet 就是 FPN 的進階版。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

而 YOLOv4 有更改一個地方,就是將 addition 改成 concatenation

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Head: YOLOv3

YOLOv3

Bag of Freebies & Bag of Specials

論文裡面還分兩類改善方式

  • Bag of Freebies (BoF)
    是指不增加推論時間又能夠提高模型準確率的方法
  • Bag of Specials (BoS)
    是指增加一些推論時間但能夠提高模型準確率的方法

而 YOLOv4 用到以下這些技巧:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

BoF for backbone

  • CutMix
    混合與修改 label 的操作
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Mosaic data augmentation
    一張圖片當4 張用,某種程度有增加 minibatch size 的效果
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • DropBlock regularization
    因為圖片是連續的,所以只隨機 dropout 幾個 pixel 是沒什麼用的,要的話就是一整個區域。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Class label smoothing
    因為最後面一層是 cross entropy,所以你要讓你的預測接近 1 的話,這個 label 就要 predict 一個很大的數,假設你最後一層的 output 是 [0.1, 0.1, 0.1, 0.1, 6] 做完 cross entropy 變成 [0.002, 0.002, 0.002, 0.002, 0.992],你會發現 6 是 0.1 的 60 倍,這種嚴重的偏差,很容易導致 overfitting,所以解決方式很簡單,把你的 ground truth 如下面的公式改成 [0.02, 0.02, 0.02, 0.02, 0.92],也就是說過度完美的解答是不好的。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

BoS for backbone

  • Cross-Stage Partial connections (CSP)
  • Mish activation
    除了大家熟知的 ReLU 以外,還有 leaky ReLU (<0 時還有一點斜率) 為主流。而 Yolov4 使用 Mish activation 為連續可微的函數,如下圖
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

數據結果證明 Mish 是比較厲害的

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • Multi-input Weighted Residual Connections (MiWRC)

BoF for detector

  • CIoU-loss
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    IoU-loss:
    IoU-loss 有很明顯的缺點,就是當 Bbox 與 ground truth 無交集時,IoU 為 0,並且不能反映出兩個 box 間的遠近,這樣就失去了梯度方向,也就是說無法優化。因此衍伸出了 GIoU-loss,DIoU-loss 和 CIoU-loss 等 loss,這些 loss 都是在 IoU-loss 的基礎上增加一個懲罰項
    R(𝐵,𝐵𝑔𝑡)
    來改進 loss。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    GIoU-loss:
    Giou-loss 在 IoU-loss 的基礎上增加一個懲罰項,根據同時包含兩個 box 的最小區域去做計算,當 bbox 的距離越大時,懲罰項將越大。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    這邊可以觀察到兩個 box 離得越遠 loss 越大。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    但 GIoU 還是有以下缺點,這些缺點會導致收斂速度變慢:
    1.當兩個 box 重疊時, GIoU-loss 會退化成 IoU-loss ,而且值都會一樣
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    2.當兩個 box 平行或是垂直的時候,會導致 GIoU-loss 的值都會一樣
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    DIoU-loss:
    DIoU 比 GIoU 更符合 Bbox 回歸的機制,除了參考 GIoU 的方法外還考慮了兩個 box 的中心距離差,使得收斂變快。
    這邊的 c 是使用同時包含兩個 box 的最小區域的對角線去計算。
    而 d 是使用兩個 box 中心點間的歐式距離。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    下圖可以觀察到,DIoU-loss 隨著中心點不同而改變,克服了第一個缺點。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    下圖可以觀察到,DIoU-loss 隨著中心點不同而改變,克服了第二個缺點。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    而收斂速度方面如下圖所示:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    CIoU-loss:
    在 DIoU 的基礎下又考慮了長寬比這項因素,懲罰項如下式,其中 𝛼 是權重函數,𝜐 是用來衡量兩個 box 長寬比的相似性。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    下圖可以觀察到兩個 box 的長寬比不同 CIoU-loss 也會有所影響。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Cross mini-Batch Normalization (CmBN)
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • DropBlock regularization
  • Mosaic data augmentation
  • Self-Adversarial Training (SAT)
    這個作者還在開發中,用 adversal (對抗) 來避免 overfit。
    作法如下:
    1.圖片輸入 model 後,中間會輸出 feature 的值
    2.扣掉這些值,也就是沿著 gradient 對圖片做修改 (model 的參數沒有動)
    3.然後把修改的圖片再丟進 model 訓練。
    e.g. 如果這個 model 只判斷狗的眼睛,就 classify 出他是狗的話,那 SAT 會在步驟二把 狗狗的眼睛的 feature 從圖片中扣掉。這樣 model 就必須學到狗的其他特徵,才可以 classify 出來。
  • Cosine annealing scheduler
    因為模型應該會收斂,所以我們要用越來越小的 learning rate 才好。會比原本的 schedule 一段時間就降 learning rate 來的好(藍色虛線)。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

BoS for detector

  • Mish activation
  • SPP-block
  • SAM-block
    對區域做 attention,會生成 HxWx1 的 attention ,哪些區域重要就會被加權起來。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • PAN path-aggregation block
  • DIOU-NMS
    是在做 objection detection 時,會 propose 出許多同義的 bounding box。NMS 作法就是有 overlap 到的就 deduplicated。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    當然有可能會錯刪(比如說 confidence 不高且靠得很近),所以 DIOU-NMS 就是再加上距離的 criteria
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

tags: 課程共筆