nbswords
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# FCOS: Fully Convolutional One-Stage Object Detection ###### tags: `paper notes` `deep learning` [Paper Link](https://arxiv.org/abs/1904.01355) --- ## Problems of anchor-based objection detection anchor-based object detection 有以下幾個問題: 1. detection performance 與 anchor 大小、比例和數量有很大關係(sensitive) 2. 因為 anchor 的大小與比例都是固定的,detector 較難處理有較大 shape variation 的物件,尤其在小物件上 3. 為了 high recall rate,anchor-based method 會放非常多的 anchor 在圖片的每個區域 - 有多多? more than 180K anchor boxes in feature pyramid networks (FPN) for an image with its shorter side being 800 - recall 高指的是分類為 positive 的框中真的是 positive 的比例高 4. IoU 的運算非常耗時且複雜 - 比如 YOLOv4 的 CIoU 就不止得算重疊面積,還要算中心點距離以及長寬比 ## 如何區分正負樣本 anchor-based 和 anchor-free 的主要差別就在於正負樣本的定義方式,通常物件偵測都會有以下三種樣本情境: - 正樣本 (positive sample): 代表這邊有某類物件的樣本,要訓練物件的分類器,同時訓練bounding box offset的regression。 - 負樣本 (negative sample): 代表這邊沒有任何物件 (屬於背景)的樣本,要訓練物件的分類器,設計上常見使用背景類別或是讓所有物件的分類輸出為零。 - 忽略樣本 (ignore sample): 不參與訓練的樣本。 FCOS 這裡的定義方式是: - location (x, y) 只要落在任何一個 ground-truth box 且這個 box 的 class label 為 ground-truth 的就會被分類為 positive sample - 若否,這個 (x, y) 就屬於 negative sample,其 c*=0 (background class) - 而如果 (x, y) 落在多個 bounding box 之中,就被視為 ambiguous sample - ambiguous sample 的問題後面會用 multi-level prediction 來解決 實作上,FCOS 其實是在預測下圖的 4D vector (l, t, r, b),分別代表從 location (x,y) 延伸出來的四個距離 - real 4D vector = (l*, t*, r*, b*) ![](https://i.imgur.com/9uIz9dq.png) 舉例來說, 若 (x,y) 落在 bbox $B_i$ 之內,則 training regression targets for the location (x,y) 就會是 ![](https://i.imgur.com/BSDIlak.png) - 作者表示他們認為 anchor-free 可以更好的去利用盡可能多的前景圖片來訓練 regressor,而這就是 anchor-free 會表現較好的原因之一 - 會這樣減是因為 ground-truth 為 left-top 和 bottom-right 的關係 ## Model ### Architecture FCOS 以 FCN 為基礎,加入 FPN 和 Focal loss - backbone CNN + FPN + 以 C 個二元分類器來取代一個多元分類器 - 每一次的輸出都會是一個 4D 的 l, t, r, b vector $p$,以及一個 80-D 的分類標籤 (MSCOCO有80類) - head 為 shared head,也就是說每個 pixel 的三種預測都是基於所有的 feature level head 來產生 ```python= # Centerness head P3_ctrness: sigmoid(head(P3)) # [B, H/8, W/8, 1] P4_ctrness: sigmoid(head(P4)) # [B, H/16, W/16, 1] P5_ctrness: sigmoid(head(P5)) # [B, H/32, W/32, 1] P6_ctrness: sigmoid(head(P6)) # [B, H/64, W/64, 1] P7_ctrness: sigmoid(head(P7)) # [B, H/128, W/128, 1] # Classification head P3_class_prob: sigmoid(head(P3)) * p3_ctrness # [B, H/8, W/8, C] P4_class_prob: sigmoid(head(P4)) * p4_ctrness # [B, H/16, W/16, C] P5_class_prob: sigmoid(head(P5)) * p5_ctrness # [B, H/32, W/32, C] P6_class_prob: sigmoid(head(P6)) * p6_ctrness # [B, H/64, W/64, C] P7_class_prob: sigmoid(head(P7)) * p7_ctrness # [B, H/128, W/128, C] # Regression head P3_reg: conv2d(head(P3)) # [B, H/8, W/8, 4] P4_reg: conv2d(head(P4)) # [B, H/16, W/16, 4] P5_reg: conv2d(head(P5)) # [B, H/32, W/32, 4] P6_reg: conv2d(head(P6)) # [B, H/64, W/64, 4] P7_reg: conv2d(head(P7)) # [B, H/128, W/128, 4] ``` ![](https://i.imgur.com/qIXMh5C.png) 與 RetinaNet 的比較 - RetinaNet 和 FCOS 的 backbone 和 FPN 幾乎是一樣的 ![](https://i.imgur.com/nFuHKTB.png) FPN - 透過簡單的 top-down 架構加上 skip connection 解決以往 detection network 無法有效率的辨識多尺度物件的問題 - (a.): 將圖片縮放成多個不同尺寸的圖片,從這些圖片產生不同尺寸的特徵圖,可以得到不錯結果但計算量跟記憶體空間需求都很大 - (b.): 只做 CNN+Pooling,利用最後一個特徵圖輸出結果,大部分的 CNN 模型都是這個類別,雖然記憶體佔用少但只關注到最後一層的特徵,ex: Fast RCNN & Faster RCNN - (c.): 在不同尺寸的特徵圖上做預測,融合預測出來的結果,雖然速度跟結果不錯但沒有重複使用特徵,導致對小物件的偵測結果不好,ex: SSD - (d.) FPN,融合深層和淺層特徵來預測,淺層特徵可以用來定位物件,深層特徵用來辨識物件 ![](https://i.imgur.com/238fb7x.png) ![](https://i.imgur.com/uCsOMGp.png) FPN 是 merged by addition, 1x1 conv 是用來讓 chennl 數量一樣 與其他 anchor free 方法的比較: 1. CornerNet 需要去配對左上角和右下角,須引入額外的距離metrics,導致模型中有複雜的後處理 2. DenseBox 很難處理重疊的 bbox,且 recall 相對低 (FCOS 利用multi-level FCN解決) 3. FCOS 為 propose-free,不需要超參數設計,且可以沿用過去的 FCN 設計 ### Anchor-based vs Anchor-free - anchor-based 的做法是考慮 input image 中的位置作為 anchor-box center,並盡量將 bounding box regress 過去 - 而 FCOS 則直接以位置作為目標來做 regress 符號定義 - 第 $i_{th}$ layer 的 feature map = $F_i$ - ground-truth bounding boxes = $B_i = (x_0^{(i)}, y_0^{(i)}, x_1^{(i)}, y_1^{(i)}, c^{(i)})$ - 其中 $x_0$ 和 $y_0$ 為 left-top corner of box, $x_1$ 和 $y_1$ 為 right-bottom corner of box - $c$ 代表的是哪一個 class backbone CNN 的 feature map $F$ 上每一個 x, y 都可以利用 $\lfloor\frac{s}{2}\rfloor + xs$ 和 $\lfloor\frac{s}{2}\rfloor + ys$ 來 mapping back onto input image,而這個映射回去的座標就會在 (x, y) 感知域中心的附近 - $s$ = total stride until the layer, FCOS 用了五個 FPN level, stride = [8, 16, 32 ,64, 128] ```python= # Compute x, y (locations) def compute_locations(h, w, stride, device): shifts_x = torch.arange( 0, w * stride, step=stride, dtype=torch.float32, device=device ) shifts_y = torch.arange( 0, h * stride, step=stride, dtype=torch.float32, device=device ) shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x) shift_x = shift_x.reshape(-1) shift_y = shift_y.reshape(-1) locations = torch.stack((shift_x, shift_y), dim=1) + stride // 2 return locations ``` ![](https://i.imgur.com/HBGjZm7.png) ![](https://i.imgur.com/XxqVvRo.png) ### loss function ![](https://i.imgur.com/zr5e5H1.png) Focal loss: 對 easy example 做 down-weighting,讓訓練過程盡量去訓練 hard example ![](https://i.imgur.com/HddS9vm.png =300x100) ![](https://i.imgur.com/cp9Jlj2.png) - 其中 $\alpha$ 代表 $\alpha$ balance,是一個能直接降低負樣本權重的值 - 作者實驗試出來發現加進去會表現比較好一點才加 - $\gamma$ 為 focusing 參數,需 >= 0,用來控制 hard example 和 easy example 的權重 - 當 $\alpha = 1$且 $\gamma = 0$ 的時候,focal loss = cross entropy UnitBox IoU loss: 只針對 gt 中的 pixel 計算 cross entropy with input of IoU - 比起 l2 loss, 更全面地把 bounding box 的四個參數融合再一起計算 IoU - 加上負數是因為要讓原本越大越好的 IoU 變成越小越好 (loss),ln 只是做簡單的 mapping 也可以換成其他的 ![](https://i.imgur.com/64LDiAw.png) inference - 把 p>0.5 的 location 作為 positive sample,並把上面的 $l*, t*, r*, b*$ 反算得到其位置 ## Tricks for improvement ### Multi-level Prediction with FPN for FCOS 原因 - FCOS 有可能會因為 1.) large stride of the final feature map in a CNN 造成很低的 best possible recall (BPR), 2.) ground-truth bounding boxes 造成在訓練過程中難以定義真實樣本的模糊性 在預測上跟 anchor-based dectector 的不同 - anchor-based 是利用各種不同大小的 anchor boxes 來分配到不同的 feature level - FCOS 是直接限制每一個 feature level 的 bounding box regression 範圍 流程 - 計算所有 feature level 中的每一的位置的 l*, t*, r*, b* - 若位置符合 max(l*, t*, r*, b*) > $m_i$ 或是 max(l*, t*, r*, b*) < $m_{i-1}$,那這個位置就會被設為 negative sample 並且不再需要 bounding box regression - $m_i$ = maximum distance that feature level $i$ needs to regress - 在 FCOS 中,m2, m3, m4, m5, m6 and m7 被設為 0, 64, 128, 256, 512, $\infty$ - 假如再做過以上計算後,仍然有位置是被多個 ground-truth bounding box 涵蓋到的話,那就會選擇面積最小的那個 bbox 作為 ground truth - 最後,他們學習 SSD, focal loss 將 heads 共享在不同 feature level上,好讓 FCOS 更加 paramter-efficient,也可以 improve performance - 他們有觀察到不同 feature level 不能用一樣的 regress size,否則不太合理 - 所以 assign the size range [0:64] for P3 and [64:128] for P4,限制每個 feature level 只能看固定範圍大小的 - 他們透過將 $exp(x)$ 改為 $exp(s_ix)$ 來達到 shared head,$s_i$ 為一個可訓練的scalar ### Center-ness for FCOS 問題 - 在做完 multi-level prediction 後,FCOS 仍和 anchor-box detector 有差距 - 他們發現這是因為 FCOS 裡面有很多與目標物件中心距離較遠的低品質 bounding boxes 解法 - 他們增加一個與 classification branch 平行的 branch 來預測一個位置的 centerness - The center-ness 代表的是一個物件的中心和他所負責的 locations(l*, t*, r*, b*) 之間的 normalized distance - 開根號是用來放緩 centerness 的 decay 速度 - center-ness 值域為 (0,1),因此是用 binary cross entropy 來訓練,並且把這個 loss 加到上面那個 loss function 上 - 在 testing 階段,final score = classifcation score * center-ness,因此 center-ness 可作爲將這些低品質 bb 的分數下降,讓他們在最後的 NMS 階段被過濾掉 ![](https://i.imgur.com/dAPRS7I.png) ![](https://i.imgur.com/inHvzra.png) - 這邊還有提到說,其實也可以直接使用 ground-truth bounding box 的中心區域作為 ground-truth,但這樣又會額外引入超參數所以他們不想使用 Center-ness 效果 ![](https://i.imgur.com/juQyqAb.png) ## Result FCOS 為當時的 SOTA 結果 ![](https://i.imgur.com/kVWyjMn.png) ![](https://i.imgur.com/BJcBAVe.jpg) Ablation Study ![](https://i.imgur.com/vcoNnRl.png) ## References Articles - [通往Anchor-Free的真相:Object Detection的正負樣本定義](https://medium.com/%E8%BB%9F%E9%AB%94%E4%B9%8B%E5%BF%83/%E9%80%9A%E5%BE%80anchor-free%E7%9A%84%E7%9C%9F%E7%9B%B8-object-detection%E7%9A%84%E6%AD%A3%E8%B2%A0%E6%A8%A3%E6%9C%AC%E5%AE%9A%E7%BE%A9-83f2fe36167f) - [Object Detection 錨點之爭: Anchor Free大爆發的2019年](https://medium.com/%E8%BB%9F%E9%AB%94%E4%B9%8B%E5%BF%83/cv-object-detection-1-anchor-free%E5%A4%A7%E7%88%86%E7%99%BC%E7%9A%842019%E5%B9%B4-e3b4271cdf1a) - [FCOS Walkthrough: The Fully Convolutional Approach to Object Detection](https://medium.com/swlh/fcos-walkthrough-the-fully-convolutional-approach-to-object-detection-777f614268c) Papers - [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002) (RetinaNet, Focal loss) - [Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection](https://arxiv.org/abs/1912.02424) (ATSS) - [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144) (FPN)

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully