HackMD
  • Beta
    Beta  Get a sneak peek of HackMD’s new design
    Turn on the feature preview and give us feedback.
    Go → Got it
      • Create new note
      • Create a note from template
    • Beta  Get a sneak peek of HackMD’s new design
      Beta  Get a sneak peek of HackMD’s new design
      Turn on the feature preview and give us feedback.
      Go → Got it
      • Sharing Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • More (Comment, Invitee)
      • Publishing
        Please check the box to agree to the Community Guidelines.
        Everyone on the web can find and read all notes of this public team.
        After the note is published, everyone on the web can find and read this note.
        See all published notes on profile page.
      • Commenting Enable
        Disabled Forbidden Owners Signed-in users Everyone
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
        • Everyone
      • Invitee
      • No invitee
      • Options
      • Versions and GitHub Sync
      • Transfer ownership
      • Delete this note
      • Template
      • Save as template
      • Insert from template
      • Export
      • Dropbox
      • Google Drive Export to Google Drive
      • Gist
      • Import
      • Dropbox
      • Google Drive Import from Google Drive
      • Gist
      • Clipboard
      • Download
      • Markdown
      • HTML
      • Raw HTML
    Menu Sharing Create Help
    Create Create new note Create a note from template
    Menu
    Options
    Versions and GitHub Sync Transfer ownership Delete this note
    Export
    Dropbox Google Drive Export to Google Drive Gist
    Import
    Dropbox Google Drive Import from Google Drive Gist Clipboard
    Download
    Markdown HTML Raw HTML
    Back
    Sharing
    Sharing Link copied
    /edit
    View mode
    • Edit mode
    • View mode
    • Book mode
    • Slide mode
    Edit mode View mode Book mode Slide mode
    Note Permission
    Read
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    Write
    Only me
    • Only me
    • Signed-in users
    • Everyone
    Only me Signed-in users Everyone
    More (Comment, Invitee)
    Publishing
    Please check the box to agree to the Community Guidelines.
    Everyone on the web can find and read all notes of this public team.
    After the note is published, everyone on the web can find and read this note.
    See all published notes on profile page.
    More (Comment, Invitee)
    Commenting Enable
    Disabled Forbidden Owners Signed-in users Everyone
    Permission
    Owners
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Invitee
    No invitee
       owned this note    owned this note      
    Published Linked with GitHub
    Like BookmarkBookmarked
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # FCOS: Fully Convolutional One-Stage Object Detection ###### tags: `paper notes` `deep learning` [Paper Link](https://arxiv.org/abs/1904.01355) --- ## Problems of anchor-based objection detection anchor-based object detection 有以下幾個問題: 1. detection performance 與 anchor 大小、比例和數量有很大關係(sensitive) 2. 因為 anchor 的大小與比例都是固定的,detector 較難處理有較大 shape variation 的物件,尤其在小物件上 3. 為了 high recall rate,anchor-based method 會放非常多的 anchor 在圖片的每個區域 - 有多多? more than 180K anchor boxes in feature pyramid networks (FPN) for an image with its shorter side being 800 - recall 高指的是分類為 positive 的框中真的是 positive 的比例高 4. IoU 的運算非常耗時且複雜 - 比如 YOLOv4 的 CIoU 就不止得算重疊面積,還要算中心點距離以及長寬比 ## 如何區分正負樣本 anchor-based 和 anchor-free 的主要差別就在於正負樣本的定義方式,通常物件偵測都會有以下三種樣本情境: - 正樣本 (positive sample): 代表這邊有某類物件的樣本,要訓練物件的分類器,同時訓練bounding box offset的regression。 - 負樣本 (negative sample): 代表這邊沒有任何物件 (屬於背景)的樣本,要訓練物件的分類器,設計上常見使用背景類別或是讓所有物件的分類輸出為零。 - 忽略樣本 (ignore sample): 不參與訓練的樣本。 FCOS 這裡的定義方式是: - location (x, y) 只要落在任何一個 ground-truth box 且這個 box 的 class label 為 ground-truth 的就會被分類為 positive sample - 若否,這個 (x, y) 就屬於 negative sample,其 c*=0 (background class) - 而如果 (x, y) 落在多個 bounding box 之中,就被視為 ambiguous sample - ambiguous sample 的問題後面會用 multi-level prediction 來解決 實作上,FCOS 其實是在預測下圖的 4D vector (l, t, r, b),分別代表從 location (x,y) 延伸出來的四個距離 - real 4D vector = (l*, t*, r*, b*) ![](https://i.imgur.com/9uIz9dq.png) 舉例來說, 若 (x,y) 落在 bbox $B_i$ 之內,則 training regression targets for the location (x,y) 就會是 ![](https://i.imgur.com/BSDIlak.png) - 作者表示他們認為 anchor-free 可以更好的去利用盡可能多的前景圖片來訓練 regressor,而這就是 anchor-free 會表現較好的原因之一 - 會這樣減是因為 ground-truth 為 left-top 和 bottom-right 的關係 ## Model ### Architecture FCOS 以 FCN 為基礎,加入 FPN 和 Focal loss - backbone CNN + FPN + 以 C 個二元分類器來取代一個多元分類器 - 每一次的輸出都會是一個 4D 的 l, t, r, b vector $p$,以及一個 80-D 的分類標籤 (MSCOCO有80類) - head 為 shared head,也就是說每個 pixel 的三種預測都是基於所有的 feature level head 來產生 ```python= # Centerness head P3_ctrness: sigmoid(head(P3)) # [B, H/8, W/8, 1] P4_ctrness: sigmoid(head(P4)) # [B, H/16, W/16, 1] P5_ctrness: sigmoid(head(P5)) # [B, H/32, W/32, 1] P6_ctrness: sigmoid(head(P6)) # [B, H/64, W/64, 1] P7_ctrness: sigmoid(head(P7)) # [B, H/128, W/128, 1] # Classification head P3_class_prob: sigmoid(head(P3)) * p3_ctrness # [B, H/8, W/8, C] P4_class_prob: sigmoid(head(P4)) * p4_ctrness # [B, H/16, W/16, C] P5_class_prob: sigmoid(head(P5)) * p5_ctrness # [B, H/32, W/32, C] P6_class_prob: sigmoid(head(P6)) * p6_ctrness # [B, H/64, W/64, C] P7_class_prob: sigmoid(head(P7)) * p7_ctrness # [B, H/128, W/128, C] # Regression head P3_reg: conv2d(head(P3)) # [B, H/8, W/8, 4] P4_reg: conv2d(head(P4)) # [B, H/16, W/16, 4] P5_reg: conv2d(head(P5)) # [B, H/32, W/32, 4] P6_reg: conv2d(head(P6)) # [B, H/64, W/64, 4] P7_reg: conv2d(head(P7)) # [B, H/128, W/128, 4] ``` ![](https://i.imgur.com/qIXMh5C.png) 與 RetinaNet 的比較 - RetinaNet 和 FCOS 的 backbone 和 FPN 幾乎是一樣的 ![](https://i.imgur.com/nFuHKTB.png) FPN - 透過簡單的 top-down 架構加上 skip connection 解決以往 detection network 無法有效率的辨識多尺度物件的問題 - (a.): 將圖片縮放成多個不同尺寸的圖片,從這些圖片產生不同尺寸的特徵圖,可以得到不錯結果但計算量跟記憶體空間需求都很大 - (b.): 只做 CNN+Pooling,利用最後一個特徵圖輸出結果,大部分的 CNN 模型都是這個類別,雖然記憶體佔用少但只關注到最後一層的特徵,ex: Fast RCNN & Faster RCNN - (c.): 在不同尺寸的特徵圖上做預測,融合預測出來的結果,雖然速度跟結果不錯但沒有重複使用特徵,導致對小物件的偵測結果不好,ex: SSD - (d.) FPN,融合深層和淺層特徵來預測,淺層特徵可以用來定位物件,深層特徵用來辨識物件 ![](https://i.imgur.com/238fb7x.png) ![](https://i.imgur.com/uCsOMGp.png) FPN 是 merged by addition, 1x1 conv 是用來讓 chennl 數量一樣 與其他 anchor free 方法的比較: 1. CornerNet 需要去配對左上角和右下角,須引入額外的距離metrics,導致模型中有複雜的後處理 2. DenseBox 很難處理重疊的 bbox,且 recall 相對低 (FCOS 利用multi-level FCN解決) 3. FCOS 為 propose-free,不需要超參數設計,且可以沿用過去的 FCN 設計 ### Anchor-based vs Anchor-free - anchor-based 的做法是考慮 input image 中的位置作為 anchor-box center,並盡量將 bounding box regress 過去 - 而 FCOS 則直接以位置作為目標來做 regress 符號定義 - 第 $i_{th}$ layer 的 feature map = $F_i$ - ground-truth bounding boxes = $B_i = (x_0^{(i)}, y_0^{(i)}, x_1^{(i)}, y_1^{(i)}, c^{(i)})$ - 其中 $x_0$ 和 $y_0$ 為 left-top corner of box, $x_1$ 和 $y_1$ 為 right-bottom corner of box - $c$ 代表的是哪一個 class backbone CNN 的 feature map $F$ 上每一個 x, y 都可以利用 $\lfloor\frac{s}{2}\rfloor + xs$ 和 $\lfloor\frac{s}{2}\rfloor + ys$ 來 mapping back onto input image,而這個映射回去的座標就會在 (x, y) 感知域中心的附近 - $s$ = total stride until the layer, FCOS 用了五個 FPN level, stride = [8, 16, 32 ,64, 128] ```python= # Compute x, y (locations) def compute_locations(h, w, stride, device): shifts_x = torch.arange( 0, w * stride, step=stride, dtype=torch.float32, device=device ) shifts_y = torch.arange( 0, h * stride, step=stride, dtype=torch.float32, device=device ) shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x) shift_x = shift_x.reshape(-1) shift_y = shift_y.reshape(-1) locations = torch.stack((shift_x, shift_y), dim=1) + stride // 2 return locations ``` ![](https://i.imgur.com/HBGjZm7.png) ![](https://i.imgur.com/XxqVvRo.png) ### loss function ![](https://i.imgur.com/zr5e5H1.png) Focal loss: 對 easy example 做 down-weighting,讓訓練過程盡量去訓練 hard example ![](https://i.imgur.com/HddS9vm.png =300x100) ![](https://i.imgur.com/cp9Jlj2.png) - 其中 $\alpha$ 代表 $\alpha$ balance,是一個能直接降低負樣本權重的值 - 作者實驗試出來發現加進去會表現比較好一點才加 - $\gamma$ 為 focusing 參數,需 >= 0,用來控制 hard example 和 easy example 的權重 - 當 $\alpha = 1$且 $\gamma = 0$ 的時候,focal loss = cross entropy UnitBox IoU loss: 只針對 gt 中的 pixel 計算 cross entropy with input of IoU - 比起 l2 loss, 更全面地把 bounding box 的四個參數融合再一起計算 IoU - 加上負數是因為要讓原本越大越好的 IoU 變成越小越好 (loss),ln 只是做簡單的 mapping 也可以換成其他的 ![](https://i.imgur.com/64LDiAw.png) inference - 把 p>0.5 的 location 作為 positive sample,並把上面的 $l*, t*, r*, b*$ 反算得到其位置 ## Tricks for improvement ### Multi-level Prediction with FPN for FCOS 原因 - FCOS 有可能會因為 1.) large stride of the final feature map in a CNN 造成很低的 best possible recall (BPR), 2.) ground-truth bounding boxes 造成在訓練過程中難以定義真實樣本的模糊性 在預測上跟 anchor-based dectector 的不同 - anchor-based 是利用各種不同大小的 anchor boxes 來分配到不同的 feature level - FCOS 是直接限制每一個 feature level 的 bounding box regression 範圍 流程 - 計算所有 feature level 中的每一的位置的 l*, t*, r*, b* - 若位置符合 max(l*, t*, r*, b*) > $m_i$ 或是 max(l*, t*, r*, b*) < $m_{i-1}$,那這個位置就會被設為 negative sample 並且不再需要 bounding box regression - $m_i$ = maximum distance that feature level $i$ needs to regress - 在 FCOS 中,m2, m3, m4, m5, m6 and m7 被設為 0, 64, 128, 256, 512, $\infty$ - 假如再做過以上計算後,仍然有位置是被多個 ground-truth bounding box 涵蓋到的話,那就會選擇面積最小的那個 bbox 作為 ground truth - 最後,他們學習 SSD, focal loss 將 heads 共享在不同 feature level上,好讓 FCOS 更加 paramter-efficient,也可以 improve performance - 他們有觀察到不同 feature level 不能用一樣的 regress size,否則不太合理 - 所以 assign the size range [0:64] for P3 and [64:128] for P4,限制每個 feature level 只能看固定範圍大小的 - 他們透過將 $exp(x)$ 改為 $exp(s_ix)$ 來達到 shared head,$s_i$ 為一個可訓練的scalar ### Center-ness for FCOS 問題 - 在做完 multi-level prediction 後,FCOS 仍和 anchor-box detector 有差距 - 他們發現這是因為 FCOS 裡面有很多與目標物件中心距離較遠的低品質 bounding boxes 解法 - 他們增加一個與 classification branch 平行的 branch 來預測一個位置的 centerness - The center-ness 代表的是一個物件的中心和他所負責的 locations(l*, t*, r*, b*) 之間的 normalized distance - 開根號是用來放緩 centerness 的 decay 速度 - center-ness 值域為 (0,1),因此是用 binary cross entropy 來訓練,並且把這個 loss 加到上面那個 loss function 上 - 在 testing 階段,final score = classifcation score * center-ness,因此 center-ness 可作爲將這些低品質 bb 的分數下降,讓他們在最後的 NMS 階段被過濾掉 ![](https://i.imgur.com/dAPRS7I.png) ![](https://i.imgur.com/inHvzra.png) - 這邊還有提到說,其實也可以直接使用 ground-truth bounding box 的中心區域作為 ground-truth,但這樣又會額外引入超參數所以他們不想使用 Center-ness 效果 ![](https://i.imgur.com/juQyqAb.png) ## Result FCOS 為當時的 SOTA 結果 ![](https://i.imgur.com/kVWyjMn.png) ![](https://i.imgur.com/BJcBAVe.jpg) Ablation Study ![](https://i.imgur.com/vcoNnRl.png) ## References Articles - [通往Anchor-Free的真相:Object Detection的正負樣本定義](https://medium.com/%E8%BB%9F%E9%AB%94%E4%B9%8B%E5%BF%83/%E9%80%9A%E5%BE%80anchor-free%E7%9A%84%E7%9C%9F%E7%9B%B8-object-detection%E7%9A%84%E6%AD%A3%E8%B2%A0%E6%A8%A3%E6%9C%AC%E5%AE%9A%E7%BE%A9-83f2fe36167f) - [Object Detection 錨點之爭: Anchor Free大爆發的2019年](https://medium.com/%E8%BB%9F%E9%AB%94%E4%B9%8B%E5%BF%83/cv-object-detection-1-anchor-free%E5%A4%A7%E7%88%86%E7%99%BC%E7%9A%842019%E5%B9%B4-e3b4271cdf1a) - [FCOS Walkthrough: The Fully Convolutional Approach to Object Detection](https://medium.com/swlh/fcos-walkthrough-the-fully-convolutional-approach-to-object-detection-777f614268c) Papers - [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002) (RetinaNet, Focal loss) - [Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection](https://arxiv.org/abs/1912.02424) (ATSS) - [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144) (FPN)

    Import from clipboard

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lost their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template is not available.


    Upgrade

    All
    • All
    • Team
    No template found.

    Create custom template


    Upgrade

    Delete template

    Do you really want to delete this template?

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Tutorials

    Book Mode Tutorial

    Slide Mode Tutorial

    YAML Metadata

    Contacts

    Facebook

    Twitter

    Feedback

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions

    Versions and GitHub Sync

    Sign in to link this note to GitHub Learn more
    This note is not linked with GitHub Learn more
     
    Add badge Pull Push GitHub Link Settings
    Upgrade now

    Version named by    

    More Less
    • Edit
    • Delete

    Note content is identical to the latest version.
    Compare with
      Choose a version
      No search result
      Version not found

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub

        Please sign in to GitHub and install the HackMD app on your GitHub repo. Learn more

         Sign in to GitHub

        HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Available push count

        Upgrade

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Upgrade

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully