Try   HackMD

Lecture 11: Detection and Segmentation

computer vision tasks

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • 語意分割(Semantic Segmentation):將每個像素標註爲某一類別,是一個分類問題。該任務不區分實例,即同一類別的不同實例都標爲同一顏色。
  • 單目標檢測(Classification + Localization):只識別圖像中的一個目標,並標出bounding box,此任務可以理解爲分類+定位
  • 目標檢測(Object Detection):識別圖像中的所有目標,並分別標出bounding box
  • 實例分割(Instance Segmentation):識別圖中所有目標,並標出它們的精確邊界。

語意分割(Semantic Segmentation)

滑動窗口

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • 以這個像素爲中心在原圖中裁切出一個子圖(Patch),並對這個子圖進行圖像分類(Classification)
  • 問題:計算量太大,有許多冗餘計算——patch間的重合部分的feature map是相同的

CNN

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • 串聯若干卷積層,輸出C個[H x W]的feature map作爲結果(C爲類別數),第i個feature map代表各像素屬於類別i的概率。
  • 問題:
  1. 需要人工標註大量的訓練數據(給圖片上的像素標類別),成本很高。
  2. 所有層產生的feature map與原圖大小相同,當網絡較深、輸入圖像分辨率較高是計算量和內存佔用量都很大。

downsampling and upsampling

保持feature map大小問題很大,所以我們可以先將feature map逐步縮小,然後再逐步變大。這樣網絡較深時中間部分的計算量也比較小。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • UnPooling:可以用池化的逆操作,通過複製值或填充常數0等操作增加feature map大小(其中UnMaxPooling在MaxPooling的對應位置保留數字,其他位置填充零)
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Transpose Convolution:分別將輸入的feature map上的值與kernel的每個元素相乘,得到的矩陣作爲輸出,換言之,是分別將輸入的feature map上的每個值作爲kernel的權重。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
為什麼要做 downsampling / max unpooling?
因為要解決若feature map與原圖大小一致時,計算量過大的問題.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
有什麼常見的模型架構是用了Transpose Convolution的嗎?

  1. GAN對抗式生成網絡中,由於需要從輸入圖像到生成圖像,自然需要將提取的特徵圖還原到和原圖同樣尺寸的大小,即也需要反卷積操作。
  2. YOLO
  3. encoder/decoder (style transfer)

單目標檢測(Classification + Localization)

我們可以將目標檢測作爲一個迴歸(Regression)任務。例如,將圖片輸入AlexNet,最後並行加上兩個全連接層,分別進行分類和定位。其中,分類的輸出是對每一類打分,這部分使用SoftMax Loss;而定位的輸出爲四個數字,分別代表bounding box的左上角橫縱座標和寬高,這部分使用L2 Loss。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
bounding box要怎麼算L2 loss?
Single Bounding Box Regression

物件偵測(Object Detection)

Sliding Window

  • 基本上就是將 CNN 套用在 image 的各種可能的 crop,然後辨識該 crop 屬於物件 or 背景。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

問題是,太多種可能的位置跟大小,要將圖片所有的物件用這種方式找出來,計算量太龐大了!! GG

R-CNN

Region Proposals

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Region Proposals是怎麼找出來的?

  • Selective Search (由 Felzenszwal 於 2004 年提出 Graph Base Image Segmentation): 使用階層群聚演算法 以 Graph Based Segmentation的結果為基礎,進行階層式的合併 (會根據顏色、紋理、大小、形狀相似度優先對區塊較為相似的部分進行合併),然後產生最後的候選區域。
  • 詳情請看:http://cs.brown.edu/people/pfelzens/papers/seg-ijcv.pdf
CNN
  • 把 Region Proposals 找出的 candidate 進行轉換(變成相同的 input image size),在輸入進 Conv Net 中,最後用 SVMS (hinge loss)、bounding-box regressions (L2) 來訓練模型

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Fast R-CNN

  • 不是從原始的 input image 找 ROI,先經過 ConvNet 得到的 feature maps 再找 ROI,因此可以減少 forward 的計算時間,testing 的速度有效的提升。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • ROI pooling: ROI Pooling时,将输入的h * w大小的feature map分割成H * W大小的子窗口(每个子窗口的大小约为h/H,w/W,其中H、W为超参数,如设定为7 x 7),对每个子窗口进行max-pooling操作,得到固定输出大小的feature map。而后进行后续的全连接层操作。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

最後 bottleneck 變成算 Region Proposal 的時間約 2 秒,計算分類跟位置只要 0.32 秒。

Faster R-CNN

  • 關鍵就是把 Regional Proposal 從 selective search 變成 CNN 模型的一部份,也就是 Regional Proposal Network (RPN)。所有的東西都可以用 CNN 模型訓練跟計算出來,在 testing 的時候就變得很快,total 只要 0.2 秒。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

YOLO / SSD

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

實例分割(Instance Segmentation)

Mask R-CNN

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

問題

組別
問題
第一組 1.有什麼常見的模型架構是用了Transpose Convolution的嗎?
2.Region Proposals是怎麼找出來的?
3.bounding box要怎麼算L2 loss? (35:06)
第二組
報告組
第三組 1.為什麼要做 downsampling? (16:55)
2.還是不太懂為什麼語意分析為啥要用max unpooling (21:09)
第四組