Fast R-CNN - HackMD

# Fast R-CNN ###### tags: `Paper reading` qwreqr ## Paper Link #### [Click here](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf) --- ## Abstract & Introduction ### Compared to R-CNN * Improve training and testing speed * trains 9× faster (VGG16) * test-time 213× faster * Increasing detection accuracy * mAP on PASCAL VOC 2012 is 66% ### Compared to [SPP net](https://arxiv.org/pdf/1406.4729.pdf) > SPPNet is 24-102× faster than the R-CNN * Improve training and testing speed * trains 3× faster(VGG16) * test-time 10× faster ![](https://i.imgur.com/AUGCwV9.png) RCNN(上) & SPPnet(下) ### Contributions * Higher detection quality (mAP) * Training is single-stage, using a multi-task loss > * Fast R-CNN的loss func包含 : > * Softmax的loss > * BBox Regression的loss * Training can update all network layers > 改善SPP只對fc作微調 * No disk storage is required for feature caching > R-CNN:每一個候選框都當做輸入進入CNN了 > Fast R-CNN:輸入一張完整的圖片，在第五個卷積層再得到每個候選框的特徵 --- ## Method ### Architecture ![Architecture](https://i.imgur.com/zvUJ7VC.png) Architecture From Paper 1. 輸入Image 2. 進行convolution，並依次產稱ROI 3. ROI Pooling layer * Resize 不同大小的feature map > RCNN v.s. Fast RCNN > * RCNN提取2000個Region Proposal並wrap完才去做CNN > * Fast RCNN做完CNN後透過ROI Pooling統一調整大小 5. FC 6. softmax 7. bbox regression *** ### RoI Projection ![](https://i.imgur.com/5gkTAQQ.jpg) 存取BBox的中心點座標與長寬，便可得到feature map中Region Proposal對應的區域。其中RoI的位置由左上角的坐標(x,y)和窗口的高和寬（h，w）組成 *** ### RoI Pooling * SPP net ![](https://i.imgur.com/HDHPbVm.png) 將不同size的feature map處理後得到相同特徵長度 1. 藍色部分把feature map劃分成4*4的小方塊,每個小方塊的寬高分別為w/4,h/4,通道數為c(不能整除時需要取整)。分別在這16個小方塊進行MaxPooling，所以共有16c個值。 2. 綠色部分把feature map劃分成2*2的小方塊,使用同樣的方法得到4c個值 3. 灰色部分把feature map劃分成1*1的小方塊,得到c個值 4. 串在一起通過ROI Pooling層得到的特徵表示是固定長度16c+4c+c=21c, 與輸入的h,w無關。 * RoI Pooling layer 利用SPP net的方法，只是將feature map上的RoI區域劃分為7×7的方塊，然後得到7×7×256的輸出。 --- ## Loss Function ### Multi-task loss loss = Softmax loss + bbox reg loss ![](https://i.imgur.com/u8FXvBf.png) > * p = (p0, . . . , pK) > 是機率，因softmax的預測輸出是類別的機率分佈 > p是通過在完全連接層的K + 1(背景)個輸出上的softmax計算的 > * u = a ground-truth class > 因 [Iverson bracket](https://zh.wikipedia.org/wiki/%E8%89%BE%E4%BD%9B%E6%A3%AE%E6%8B%AC%E5%8F%B7)，所以 ![](https://i.imgur.com/vk7qsSq.png)，當u>=1時才有意義，u=0時表示背景 > * ![](https://i.imgur.com/P8KuB0X.png) > K個object classes中的每一個bbox reg偏移量 > Bbox reg的output > * v = ground-truth bounding-box regression target ![](https://i.imgur.com/AmlW88s.png) > * λ controls the balance between the two task losses > All experiments use λ = 1 #### Softmax loss ![](https://i.imgur.com/jYplYSO.png) #### bbox reg loss ![](https://i.imgur.com/eVjA5rw.png) --- ## Q&A * Training is single-stage 原本Region Proposal+classification是two stage，但因為CNN是pretrain，所以才說是Training is single-stage >single-stage定義:把object localization和classification合併一起處理。 * SVM分類如果沒有那類別依樣會分到某一類，但透過設定準確率低於某個數值可以將他排除 * Bbox reg調整那裡的參數 Bbox reg調整的是他前一層fc的參數