Object Detection Develop History-2
==Two Stage Method ==
stage 1 = region proposal
stage 2 = feature extract + bounding box
R-CNN
Fast R-CNN
Faster R-CNN
R-FCN
R-CNN
Selective Search
- Selective Search
傳統的Computer Vision會先把照片分成很多個小的region
接著透過紋理//色彩等進行鄰近region的相似度測試,相似度高的–-> 合併成新的region
重複合併直到整張圖沒辦法再合併出一個新的region
屬於階層式的作法
- 用Selective Search 之後傳統方式是找region經過分類後分數較高的幾個出來,缺點是proposal可能會重疊
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
What did R-CNN DONE?
Selective Search –-> Rescale Region Proposal –-> CNN
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- Rescale Region Proposal to 227x227: CNN use ImageNet(AlexNet)
- 若SVM判斷為背景 –-> 不用做任何事
- 若SVM判斷為物體 –-> 用Regression判斷物體框框位置與ground truth框框的差距
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
IOU
Intersection of units
= (Predict Bounding Box 與 Ground Truth的交集面積)除以兩個BOX的聯集面積
IOU越大越好(越接近1) –-> 交集區域愈大 = 聯集區域愈小
會設一個threshold值來判定是否有偵測到物件: ex. IOU>0.5
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
CNN: 對IOU要求不高,約0.5就可以找到物體,但也是容易overfitting
SVM: 正樣本在圖片中算是少數,SVM較適合這種少樣本的訓練,IOU約0.7比較適合
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Non Maximum Suppresion
當最後的bounding box被生產出來,可能非常多個,占滿了整張圖片
其中有不少重疊或是可信度不高的box
所以可以用這個技巧(非極大值抑制)
選擇一個具有最高信心的眶,並刪除與這個框有一定重疊率的其他框
重疊率可以自行設定,ex 0.6
Fast R-CNN
將Region proposal之後的步驟都用在同一個網路來加速運算
並且設計更為合理(R-CNN 的 SVM & Bounding Box 的loss無法回傳到前面找feature的CNN)
ROI Pooling
因為CNN的輸入必須是相同的大小,所以用ROI Pooling 來達到統一大小
以下例子是要輸出2x2大小的feature map
ROI area = 7x5
所以將ROI分成4塊(2x2)
然後每一塊都做maxpooling(avg pooling 也是可以)
然後就取這4塊的max當作2x2的feature map輸出
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Fast R-CNN訓練階段
分類和BOX回歸的LOSS要加在一起,實驗結果顯示表現更好
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Faster R-CNN
是時候把Selective Search用CNN取代了,因為還不夠快
接著feature也應該整張圖只算一次就交給region proposal + bounding box使用
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
CNN的region proposal
透過事先定義好的9個anchor box
以sliding window的方式取找可能的anchor point(anchor box中心點)
找到中心點後記錄分數,有k個box–-> 2k個分數(是否是物體) + 4k個數字(x,y,w,h)
x,y是box的左上座標
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
example
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
cons: 對小物體的辨識不友好
R-FCN
Faster R-CNN還是不夠快
因為還是有地方不是共用CNN
為何不能共用?
classification = translation invariance = 位置不重要,只要知道是哪個物體就行
不管物體今天在照片的哪個地方,對它來說都一樣
object detection = translation variance = 位置重要
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
R-FCN 決定第二部分所有的CNN都共用
在REGION PROPOSAL中對位置進行編碼,分成k x k個區域
之後在feature map 之後得到一個 k^2 x (C+1) 厚度的CNN
C = 物體總數
C + 1 = 加入一類 –> 背景
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
將人分為9個區域,中上區域是人的頭,愈亮代表分數愈高
概念是: 今天一個物體被區分成9分,每一份應該都含有可以被辨識的一些特徵(亮的地方)
若今天一張Proposal,9個區域內只有一兩個區域是亮的,那就代表:
很多區域沒有被偵測到特徵 ––> 這就不是一個好的proposal
那投票機制?
設有20類物體 + 1 類背景
k = 3
所以一個proposal的位置會產生厚度為3x3x21的feature map
將每一層feature map 9個格子的值去取平均
然後找出分數最高的那一個(21個平均值一起比) –-> 這個proposal就是該類