Try   HackMD

Object Detection Develop History-2

==Two Stage Method ==
stage 1 = region proposal
stage 2 = feature extract + bounding box

R-CNN
Fast R-CNN
Faster R-CNN
R-FCN

R-CNN

Selective Search

  1. Selective Search
    傳統的Computer Vision會先把照片分成很多個小的region
    接著透過紋理//色彩等進行鄰近region的相似度測試,相似度高的-> 合併成新的region
    重複合併直到整張圖沒辦法再合併出一個新的region
    屬於階層式的作法
  2. 用Selective Search 之後傳統方式是找region經過分類後分數較高的幾個出來,缺點是proposal可能會重疊
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

What did R-CNN DONE?

Selective Search -> Rescale Region Proposal -> CNN

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. Rescale Region Proposal to 227x227: CNN use ImageNet(AlexNet)
  2. 若SVM判斷為背景 -> 不用做任何事
  3. 若SVM判斷為物體 -> 用Regression判斷物體框框位置與ground truth框框的差距
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

IOU

Intersection of units
= (Predict Bounding Box 與 Ground Truth的交集面積)除以兩個BOX的聯集面積

IOU越大越好(越接近1) -> 交集區域愈大 = 聯集區域愈小
會設一個threshold值來判定是否有偵測到物件: ex. IOU>0.5

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

CNN: 對IOU要求不高,約0.5就可以找到物體,但也是容易overfitting
SVM: 正樣本在圖片中算是少數,SVM較適合這種少樣本的訓練,IOU約0.7比較適合

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Non Maximum Suppresion

當最後的bounding box被生產出來,可能非常多個,占滿了整張圖片
其中有不少重疊或是可信度不高的box
所以可以用這個技巧(非極大值抑制)

選擇一個具有最高信心的眶,並刪除與這個框有一定重疊率的其他框
重疊率可以自行設定,ex 0.6

Fast R-CNN

將Region proposal之後的步驟都用在同一個網路來加速運算
並且設計更為合理(R-CNN 的 SVM & Bounding Box 的loss無法回傳到前面找feature的CNN)

ROI Pooling
因為CNN的輸入必須是相同的大小,所以用ROI Pooling 來達到統一大小
以下例子是要輸出2x2大小的feature map
ROI area = 7x5
所以將ROI分成4塊(2x2)
然後每一塊都做maxpooling(avg pooling 也是可以)
然後就取這4塊的max當作2x2的feature map輸出

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Fast R-CNN訓練階段

分類和BOX回歸的LOSS要加在一起,實驗結果顯示表現更好

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Faster R-CNN

是時候把Selective Search用CNN取代了,因為還不夠快
接著feature也應該整張圖只算一次就交給region proposal + bounding box使用

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

CNN的region proposal

透過事先定義好的9個anchor box
以sliding window的方式取找可能的anchor point(anchor box中心點)
找到中心點後記錄分數,有k個box-> 2k個分數(是否是物體) + 4k個數字(x,y,w,h)
x,y是box的左上座標

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

example
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

cons: 對小物體的辨識不友好


R-FCN
Faster R-CNN還是不夠快
因為還是有地方不是共用CNN
為何不能共用?

classification = translation invariance = 位置不重要,只要知道是哪個物體就行
不管物體今天在照片的哪個地方,對它來說都一樣

object detection = translation variance = 位置重要

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

R-FCN 決定第二部分所有的CNN都共用
在REGION PROPOSAL中對位置進行編碼,分成k x k個區域
之後在feature map 之後得到一個 k^2 x (C+1) 厚度的CNN
C = 物體總數
C + 1 = 加入一類 > 背景

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

將人分為9個區域,中上區域是人的頭,愈亮代表分數愈高
概念是: 今天一個物體被區分成9分,每一份應該都含有可以被辨識的一些特徵(亮的地方)
若今天一張Proposal,9個區域內只有一兩個區域是亮的,那就代表:
很多區域沒有被偵測到特徵 > 這就不是一個好的proposal

那投票機制?

設有20類物體 + 1 類背景
k = 3
所以一個proposal的位置會產生厚度為3x3x21的feature map
將每一層feature map 9個格子的值去取平均
然後找出分數最高的那一個(21個平均值一起比) -> 這個proposal就是該類

tags: SAR Deep learning Object Detection review