Object Detection Develop History-2

==Two Stage Method ==
stage 1 = region proposal
stage 2 = feature extract + bounding box

R-CNN
Fast R-CNN
Faster R-CNN
R-FCN

R-CNN

Selective Search

Selective Search
傳統的Computer Vision會先把照片分成很多個小的region
接著透過紋理//色彩等進行鄰近region的相似度測試，相似度高的–-> 合併成新的region
重複合併直到整張圖沒辦法再合併出一個新的region
屬於階層式的作法
用Selective Search 之後傳統方式是找region經過分類後分數較高的幾個出來，缺點是proposal可能會重疊
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

What did R-CNN DONE?

Selective Search –-> Rescale Region Proposal –-> CNN

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Rescale Region Proposal to 227x227: CNN use ImageNet(AlexNet)
若SVM判斷為背景 –-> 不用做任何事
若SVM判斷為物體 –-> 用Regression判斷物體框框位置與ground truth框框的差距
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

IOU

Intersection of units
= (Predict Bounding Box 與 Ground Truth的交集面積)除以兩個BOX的聯集面積

IOU越大越好(越接近1) –-> 交集區域愈大 = 聯集區域愈小
會設一個threshold值來判定是否有偵測到物件: ex. IOU>0.5

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

CNN: 對IOU要求不高，約0.5就可以找到物體，但也是容易overfitting
SVM: 正樣本在圖片中算是少數，SVM較適合這種少樣本的訓練，IOU約0.7比較適合

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Non Maximum Suppresion

當最後的bounding box被生產出來，可能非常多個，占滿了整張圖片
其中有不少重疊或是可信度不高的box
所以可以用這個技巧(非極大值抑制)

選擇一個具有最高信心的眶，並刪除與這個框有一定重疊率的其他框
重疊率可以自行設定,ex 0.6

Fast R-CNN

將Region proposal之後的步驟都用在同一個網路來加速運算
並且設計更為合理(R-CNN 的 SVM & Bounding Box 的loss無法回傳到前面找feature的CNN)

ROI Pooling
因為CNN的輸入必須是相同的大小，所以用ROI Pooling 來達到統一大小
以下例子是要輸出2x2大小的feature map
ROI area = 7x5
所以將ROI分成4塊(2x2)
然後每一塊都做maxpooling(avg pooling 也是可以)
然後就取這4塊的max當作2x2的feature map輸出

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Fast R-CNN訓練階段

分類和BOX回歸的LOSS要加在一起，實驗結果顯示表現更好

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Faster R-CNN

是時候把Selective Search用CNN取代了，因為還不夠快
接著feature也應該整張圖只算一次就交給region proposal + bounding box使用

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

CNN的region proposal

透過事先定義好的9個anchor box
以sliding window的方式取找可能的anchor point(anchor box中心點)
找到中心點後記錄分數，有k個box–-> 2k個分數(是否是物體) + 4k個數字(x,y,w,h)
x,y是box的左上座標

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

example

cons: 對小物體的辨識不友好

R-FCN
Faster R-CNN還是不夠快
因為還是有地方不是共用CNN
為何不能共用?

classification = translation invariance = 位置不重要，只要知道是哪個物體就行
不管物體今天在照片的哪個地方，對它來說都一樣

object detection = translation variance = 位置重要

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

R-FCN 決定第二部分所有的CNN都共用
在REGION PROPOSAL中對位置進行編碼，分成k x k個區域
之後在feature map 之後得到一個 k^2 x (C+1) 厚度的CNN
C = 物體總數
C + 1 = 加入一類 –> 背景

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

將人分為9個區域，中上區域是人的頭，愈亮代表分數愈高
概念是: 今天一個物體被區分成9分，每一份應該都含有可以被辨識的一些特徵(亮的地方)
若今天一張Proposal，9個區域內只有一兩個區域是亮的，那就代表:
很多區域沒有被偵測到特徵 ––> 這就不是一個好的proposal

那投票機制?

設有20類物體 + 1 類背景
k = 3
所以一個proposal的位置會產生厚度為3x3x21的feature map
將每一層feature map 9個格子的值去取平均
然後找出分數最高的那一個(21個平均值一起比) –-> 這個proposal就是該類

Object Detection Develop History-2

R-CNN

IOU

Non Maximum Suppresion

Fast R-CNN訓練階段

那投票機制?

tags: SAR Deep learning Object Detection review

Read more

研替面試之路

從成大測量到台大測量組

SQL語法學習心得-4

R 語言學習心得 Text Mining + WordCloud

tags: `SAR` `Deep learning` `Object Detection` `review`