Lecture 11: Detection and Segmentation

computer vision tasks

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

語意分割（Semantic Segmentation）：將每個像素標註爲某一類別，是一個分類問題。該任務不區分實例，即同一類別的不同實例都標爲同一顏色。
單目標檢測（Classification + Localization）：只識別圖像中的一個目標，並標出bounding box，此任務可以理解爲分類+定位
目標檢測（Object Detection）：識別圖像中的所有目標，並分別標出bounding box
實例分割（Instance Segmentation）：識別圖中所有目標，並標出它們的精確邊界。

語意分割（Semantic Segmentation）

滑動窗口

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

以這個像素爲中心在原圖中裁切出一個子圖（Patch），並對這個子圖進行圖像分類（Classification）
問題：計算量太大，有許多冗餘計算——patch間的重合部分的feature map是相同的

CNN

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

串聯若干卷積層，輸出C個[H x W]的feature map作爲結果（C爲類別數），第i個feature map代表各像素屬於類別i的概率。
問題：

需要人工標註大量的訓練數據（給圖片上的像素標類別），成本很高。
所有層產生的feature map與原圖大小相同，當網絡較深、輸入圖像分辨率較高是計算量和內存佔用量都很大。

downsampling and upsampling

保持feature map大小問題很大，所以我們可以先將feature map逐步縮小，然後再逐步變大。這樣網絡較深時中間部分的計算量也比較小。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

UnPooling：可以用池化的逆操作，通過複製值或填充常數0等操作增加feature map大小（其中UnMaxPooling在MaxPooling的對應位置保留數字，其他位置填充零)
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Transpose Convolution：分別將輸入的feature map上的值與kernel的每個元素相乘，得到的矩陣作爲輸出，換言之，是分別將輸入的feature map上的每個值作爲kernel的權重。
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

為什麼要做 downsampling / max unpooling？
因為要解決若feature map與原圖大小一致時，計算量過大的問題．

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

有什麼常見的模型架構是用了Transpose Convolution的嗎？

GAN對抗式生成網絡中，由於需要從輸入圖像到生成圖像，自然需要將提取的特徵圖還原到和原圖同樣尺寸的大小，即也需要反卷積操作。
YOLO
encoder/decoder (style transfer)

單目標檢測（Classification + Localization）

我們可以將目標檢測作爲一個迴歸（Regression）任務。例如，將圖片輸入AlexNet，最後並行加上兩個全連接層，分別進行分類和定位。其中，分類的輸出是對每一類打分，這部分使用SoftMax Loss；而定位的輸出爲四個數字，分別代表bounding box的左上角橫縱座標和寬高，這部分使用L2 Loss。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

bounding box要怎麼算L2 loss？
Single Bounding Box Regression

物件偵測（Object Detection）

Sliding Window

基本上就是將 CNN 套用在 image 的各種可能的 crop，然後辨識該 crop 屬於物件 or 背景。
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

問題是，太多種可能的位置跟大小，要將圖片所有的物件用這種方式找出來，計算量太龐大了！！ GG

R-CNN

Region Proposals

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Region Proposals是怎麼找出來的？

Selective Search (由 Felzenszwal 於 2004 年提出 Graph Base Image Segmentation): 使用階層群聚演算法以 Graph Based Segmentation的結果為基礎，進行階層式的合併 (會根據顏色、紋理、大小、形狀相似度優先對區塊較為相似的部分進行合併)，然後產生最後的候選區域。
詳情請看：http://cs.brown.edu/people/pfelzens/papers/seg-ijcv.pdf

CNN

把 Region Proposals 找出的 candidate 進行轉換(變成相同的 input image size)，在輸入進 Conv Net 中，最後用 SVMS (hinge loss)、bounding-box regressions (L2) 來訓練模型

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Fast R-CNN

不是從原始的 input image 找 ROI，先經過 ConvNet 得到的 feature maps 再找 ROI，因此可以減少 forward 的計算時間，testing 的速度有效的提升。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

ROI pooling: ROI Pooling时，将输入的h * w大小的feature map分割成H * W大小的子窗口（每个子窗口的大小约为h/H，w/W，其中H、W为超参数，如设定为7 x 7），对每个子窗口进行max-pooling操作，得到固定输出大小的feature map。而后进行后续的全连接层操作。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

最後 bottleneck 變成算 Region Proposal 的時間約 2 秒，計算分類跟位置只要 0.32 秒。

Faster R-CNN

關鍵就是把 Regional Proposal 從 selective search 變成 CNN 模型的一部份，也就是 Regional Proposal Network (RPN)。所有的東西都可以用 CNN 模型訓練跟計算出來，在 testing 的時候就變得很快，total 只要 0.2 秒。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

YOLO / SSD

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

實例分割（Instance Segmentation）

Mask R-CNN

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

問題

組別	問題
第一組	1.有什麼常見的模型架構是用了Transpose Convolution的嗎？ 2.Region Proposals是怎麼找出來的？ 3.bounding box要怎麼算L2 loss？ (35:06)
第二組	報告組
第三組	1.為什麼要做 downsampling? (16:55) 2.還是不太懂為什麼語意分析為啥要用max unpooling (21:09)
第四組

Lecture 11: Detection and Segmentation

computer vision tasks

語意分割（Semantic Segmentation）

滑動窗口

CNN

downsampling and upsampling

單目標檢測（Classification + Localization）

物件偵測（Object Detection）

Sliding Window

R-CNN

Region Proposals

CNN

Fast R-CNN

Faster R-CNN

YOLO / SSD

實例分割（Instance Segmentation）

Mask R-CNN

問題

Read more

Lecture 6: Training Neural Networks, part I

組別 (introduction to computer visioin / Udacity)

組別 (CS231n)

Lecture 14: Deep Reinforcement Learning