Yolo-V4 - HackMD

# Yolo-V4 ###### tags: `lab` `ntu` #### DIoU \begin{equation} \begin{aligned} DIoU &= IoU - R_{DIoU}(M,B_i) \\ &= IoU - \frac{\rho^2(b,b^{gt})}{c^2} \\ R_{DIoU}(M,B_i) = \frac{\rho^2(b,b^{gt})}{c^2} \\ \end{aligned} \end{equation} #### mish \begin{equation} \begin{aligned} softplus(x) &= log(1+e^x) \\ tanh(x) &= \frac{e^x-e^{-x}}{e^x+e^{-x}} \\ mish(x) &= x * tanh(softplus(x)) \\ \end{aligned} \end{equation} #### CIoU \begin{equation} \begin{aligned} L_{CIoU} = 1 - IoU + \frac{\rho^2(b,b^{gt})}{c^2} + \alpha \frac{4}{\pi^2}(arctan\frac{w^{gt}}{h^{gt}}-arctan\frac{w}{h})^2 \end{aligned} \end{equation} --- # 問題們 --- 1. backbone跟detector是一起fintuned嗎? 會fixed住嗎?先猜會 3. ResNeXt在classification較好， Darknet在detection卻比較好 4. backbone train在甚麼上面imagenet? 試的imagenet 6. AP vs mAP ?? AP 去算IoU然後排序， 10. AP50 AP75代表甚麼意思啊? 8. 為甚麼要modeifi SAM, PAN , 論文好像沒寫? 試了效果比較好讚啦 11. 所以YOLO v4的架構到底長得怎麼樣啊? 12. 為甚麼她速度可以比較快(fps)? 13. self adversarial training? 自對抗訓練也是一種新的數據增強方法，可以一定程度上抵抗對抗攻擊。其包括兩個階段，每個階段進行一次前向傳播和一次反向傳播。第一階段，CNN通過反向傳播改變圖片信息，而不是改變網絡權值。通過這種方式，CNN可以進行對抗性攻擊，改變原始圖像，造成圖像上沒有目標的假象。第二階段，對修改後的圖像進行正常的目標檢測。 15. data augment有用啥只有mosaic嗎? mosaic, SAT(Self-Adversarial Training) 15. 為甚麼他們要挑resNeXT去比啊?? 因為receptive field 17. (CSPResNeXt) CSP試甚麼 ? CSPNet: A New Backbone that can Enhance Learning Capability of CNN 17. Dropblock , Dropconnect ?? 18. pan path aggregation 優點?? 19. label smoothing 部會給label是1 ok 20. --- # Hey --- ## 相關文章 https://towardsdatascience.com/yolo-v4-optimal-speed-accuracy-for-object-detection-79896ed47b50 ## features used ### Weighted-Residual-Connections (WRC) ### Cross-Stage-Partial-connection (CSP) ### Cross mini-Batch Normalization (CmBN) ### Self-adversaral-training (SAT) ### Mish-activation ### Mosaic data augumentation ### CopBlock regularization ### CIoU loss ### modify * CBN * PAN * SAM --- # PPT --- ## Modern Object-detector ![](https://i.imgur.com/b647N93.png) * stages * two-stage * R-CNN * fast R-CNN * faster R-CNN * one-stage * YOLO * SSD * backbone * GPU * VGG16 * ResNet-50 * ResNeXt-101 * DenseNet * CPU * MobelNet * ShuffleNet * neck * FPN (feature pyramid network) ![](https://i.imgur.com/vWrfHrF.png) ![](https://i.imgur.com/PX7AEWO.png) * PaNet (path aggregation network) * Bi-FPN * head This is a network in charge of actually doing the detection part (classification and regression) of bounding boxes. A single output may look like (depending on the implementation): 4 coordinates describing the predicted box (x, y, h, w) and the probability of k classes + 1 (one extra for background). Objected detectors anchor-based, like YOLO, apply the head network to each anchor box. Other popular one-stage detectors, which are anchor-based, are: Single Shot Detector[6] and RetinaNet[4]. * ## Bag of freebies * focal loss * deal with imbalance between various classes * IoU loss * GIoU * DIoU * CIoU converge faster & better acc * * ## Bag of specials 增加一點點inference time，但是performance上升明顯 * Enhance receptive field * SPP * ASPP * RFB dilated conv, cost 7% extra inference time, increase 5.7% MS COCO AP * attention module * Squeeze-and-Excitation(SE) (channel-wise) * Spatial Attentino Module(SAM) (point-wise) * post-processsing * NMS --- # Methodology --- ## Selection of architecture A reference model which is optimal for classification is not always optimal for a detector * Objective 1 * resolution * number of conv layers * number of parameters (f_size^2 x filters x channels/groups) * Objective 2 * increase receptive field FPN, PAN, ASFF, BiFPN * what is receptive field ![](https://i.imgur.com/oWPFK10.png) * In contrast to classifier, detector need: 1. Higher input network size (resoution) - for detecting mutiple small-sized objects 2. More layers - for higher receptive field to cover increased size of input network ``` 2. image classifier圖片都且好好的了，但是detector task不然他們scale不一樣，要增加receptive field ``` --- # Others --- yolo演化: https://mropengate.blogspot.com/2018/06/yolo-yolov3.html yolov4介紹: https://towardsdatascience.com/yolo-v4-optimal-speed-accuracy-for-object-detection-79896ed47b50 fb看到yolov4: https://bangqu.com/rrhsI5.html?fbclid=IwAR2fchCn5dGhQAfMmmDPUh-kKNFICnEnSndnTpKGHRJIsaSicJem7Lfyv8w IoU: https://blog.csdn.net/donkey_1993/article/details/104006474