DETR(End-to-End Object Detection with Transformers)

# DETR(End-to-End Object Detection with Transformers) https://arxiv.org/pdf/2005.12872.pdf 作者： Nicolas Carion⋆, Francisco Massa⋆, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko Facebook 人工智慧研究實驗室（FAIR） ## 與以往 Object dection 的差異 * 先用傳統 CNN 抽出 feature，再把 feature 送進 Transformr 內 * 輸出 N 個 set prediction * 每個 set 包含 box center, scale, object class ![](https://i.imgur.com/IEBML5z.png) * No proposal, no anchor * 不需要在feature map上面做dense prediction * 沒有 region proposal(R-CNN 裡面提到的) 或 anchor ## 3.1 Object detection set prediction loss * 一次推斷固定 N 個 predictions。 * 這個 N 會遠大於實際的物件數量 * DETR * 輸入：一張影像 * 輸出：set prediction * 影像通過 backbone convolution network，抽出 feature map，再把 feature map 在長寬部分攤平 * spatial 透過 positional encoding 後，再跟 feature map concat 起來 ![](https://i.imgur.com/rbLEruG.png) ![](https://i.imgur.com/teoAsyR.png) ### Object queries Object queries並不是直接作為decoder的輸入，而是透過learnable positional encoding影響Transformer的decoder。可以把每個 Object queries 想成是 one-hot encoding 中的一個類別，經過 position embedding 後變成 embedding(高維空間中的一個 vector)，之後就跟任務一起訓練 ## Bipartite matching with Hungarian algorithm 因為 DETR 預測的 box set 是無序的，因此在計算 loss 的時候會不知道哪個 prediction 的 box 要對應到哪個 ground truth 的 box，所以才要用 bipartite matching 。 loss function 看不懂....... ## Hungarian Algorithm The time complexity of the brutal force is $O(n!)$. Using the Hungarian Algorithm can archieve to $O(n^3)$ * Hungarian Algorithm 1. Row Reduction 2. Column Reduction 3. Test for an optimal assignment 4. Shift zeros 5. Making the final assignment ## SUMMARY 在架構上是一個全新的架構，把以往做 Object detection 的 trick 都移除了，讓 Transformer 成為主要架構，可以做到 pipeline，降低實作複雜度。但在偵測表現和運算時間上都沒有太大的優勢，時間上的 bottleneck 主要是要解 bipartite matching， ###### tags: `mllearning2020`