DCNv1 (Deformable Convolutional Networks)

###### tags: `Paper Notes` # DCNv1 (Deformable Convolutional Networks) * 原文：[Deformable Convolutional Networks](https://arxiv.org/abs/1703.06211) * 機構：Microsoft Research Asia * 時間：2017 年 ### Introduction * 如圖 1.a 所示，傳統的 CNN 在做計算時，其輸入皆為上一層 feature maps 的固定位置，使得傳統 CNN 的 geometric transformation 能力不足，最終導致模型要堆疊很多層或是要對資料集做大量的 data augmentation 才能彌補這項缺失。 * 作者提出了 deformable convolution 及 deformable ROI pooling 以解決這項問題。如圖 1.b 所示。 <center><img src="https://i.imgur.com/ckDElAR.png" ></center> <center>圖 1：(a) standard convolution；(b) deformable convolution；(c)(d) special cases of (b)</center> ### Deformable Convolutional Networks <center><img src="https://i.imgur.com/8Tp2yJm.png"></center> <center>圖 B：feature map 示意圖。</center> * deformable convolution： * 如圖 B 所示，傳統的 convolution layer 在做計算時，其公式如下： $$ y(p_0) = \sum_{p_n \in R} w(p_n) x(p_0 + p_n) $$ * $y(p_0)$：以 $p_0$ 為中心的 conv. 的輸出結果。 * $R$：kernel 中所有 grid 的相對位置。當 kernel size = 3 時，$R = \{(-1, -1), (0, -1), (1, -1), (0, -1), (0, 0), (0, 1), (1, -1), (1, 0), (1, 1)\}$。 * $w$：convolution layer 的權重。 * $x$：feature map 中的值。 * deformable convolution 就是在加個 offset 項，使得我們可以從感興趣的地方取得輸入，而不是只能從固定位置。寫成公式就長這樣子： $$ y(p_0) = \sum_{p_n \in R} w(p_n) x(p_0 + p_n + \Delta p_{n}) $$ * 由於 $\Delta p_{n}$ 可能是小數，因此我們用 bilinear interpolation (見附錄) 計算 $x(p_0 + p_n + \Delta p_{n})$ 的最終結果。計算時是使用最相近的四個整數點做 bilinear interpolation。化成公式如下： $$ x(p) = \sum_{q}G(q, p)x(q) \\ G(p, q) = g(q_x, p_x) * g(q_y, p_y) \\ g(a, b) = max(0, 1 - |a - b|) $$ * $q$：與 $p$ 最相近的四個整數點。若 $p = (2.4, 5.6)$ 則 $q = \{ (2, 5), (2, 6), (3, 5), (3, 6) \}$。 * $G(q, p)$：bilinear interpolation 公式中的權重。 * deformable convolution 實作 (如圖 2 所示)： * 先將 feature map 丟入一個傳統的 conv.，得到各個 grid 的 offset。由於 offset 有分 x 與 y 兩個方向，因此 conv. 的 filter 數為 $2N$。$N$ 表示 kernel 中的 grid 數 (圖 2 中 $N = 9$)。 * 利用 offset 找出新的 input 來源後，再做一次 conv.。 <center><img src="https://i.imgur.com/HTGcNLb.png"></center> <center>圖 2：deformable convolution 示意圖。</center> * deformable ROI pooling： * deformable ROI pooling 與 deformable convolution 的概念類似。只差在實作上 deformable ROI pooling 使用 fc 而 deformable convolution 使用 conv.。如圖 3 所示。 * 傳統 ROI pooling 公式如下： $$ y(i, j) = \sum_{p \in bin(i,j)} x(p_0 + p) / n_{ij} $$ * $y(i, j)$：第 $(i, j)$ 個 bin 的輸出。 * $p_0$：bin 的左上角座標。 * $n_{ij}$：第 $(i, j)$ 個 bin 的 grid 的數量。(圖 3 中 $n_{ij} = 9$)。 * deformable ROI pooling 的公式如下： $$ y(i, j) = \sum_{p \in bin(i,j)} x(p_0 + p + \Delta p_{ij}) / n_{ij} $$ * 注意：一個 bin 中的所有的 grid cell 的 offset 皆相同。 > 個人覺得圖 3 左側籃色框框畫的不太好，會讓人誤以為每個 grid cell 都有自己的 offset。 * deformable ROI pooling 實作 (如圖 3 所示)： * 先將 feature map 通過 fc 後得到 normalized offset，$\Delta \hat{p_{ij}}$。 * 將 $\Delta \hat{p_{ij}}$ 乘上 bin size，$(w, h)$，及一個常數 $\gamma$ 後即可得到最終的 offset。($\gamma$ 通常設 0.1) $$ \Delta p_{ij} = \gamma * \Delta \hat{p_{ij}} * (w, h) $$ * 最後拿新的 input 做 ROI pooling 即可。 <center><img src="https://i.imgur.com/bBhY8Hf.png"></center> <center>圖 3：deformable ROI pooling 示意圖。</center> * deformable RS ROI pooling： > 這邊看不太懂。 * 如圖 4 所示。每個 bin 的大小為 $k \times k$、物件種類數共有 $C$ 類。 <center><img src="https://i.imgur.com/qCXYlfY.png"></center> <center>圖 4：deformable RS ROI pooling 示意圖。</center> ### Understanding Deformable ConvNets * 使用 deformable 結構的 CNN 網路簡稱做 deformable convnets。 * deformable convnets 的效果如圖 5 所示。可以看到相較於 standard convolution，deformable convolution 的 receptive field 可以根據物體的形狀、大小作變化。 <center><img src="https://i.imgur.com/yExZzYk.png"></center> <center>圖 5：deformable convnets 效果示意圖。</center> ### Appendix - Bilinear Interpolation * 在講 bilinear interpolation (雙線性插值) 之前，要先講 linear interpolation (線性插值)： * 已知直線上的兩點 $x_0、x_1$，其值分別是 $y_0、y_1$，則 $x \in [x_0, x_1]$ 在直線上的 $y$ 值為： $$ \frac{y - y_0}{x - x_0} = \frac{y - y_1}{x - x_1} \\ y = \frac{x_1 - x}{x_1 - x_0} * y_0 + \frac{x - x_0}{x_1 - x_0} * y_1 $$ * 計算原理：直線上所有的點的斜率皆相同。 * bilinear interpolation： * 已知平面上的四個點 $(x_0, y_0)、(x_0, y_1)、(x_1, y_0)、(x_1, y_1)$，其值分別是 $z_{(x_0, y_0)}、z_{(x_0, y_1)}、z_{(x_1, y_0)}、z_{(x_1, y_1)}$。bilinear interpolation 相當於求 $z_{(x, y)}$ 值。 * 先沿著 $x$ 軸做一次 linear interpolation： $$ z_{(x, y_0)} = \frac{x_1 - x}{x_1 - x_0} z_{(x_0, y_0)} + \frac{x - x_0}{x_1 - x_0} z_{(x_1, y_0)} \\ z_{(x, y_1)} = \frac{x_1 - x}{x_1 - x_0} z_{(x_0, y_1)} + \frac{x - x_0}{x_1 - x_0} z_{(x_1, y_1)} $$ * 再沿著 $y$ 軸做 linear interpolation： $$ z_{(x, y)} = \frac{y_1 - y}{y_1 - y_0} z_{(x, y_0)} + \frac{y - y_0}{y_1 - y_0} z_{(x, y_1)} $$ <center><img src="https://i.imgur.com/G8ICuvV.png"></center> <center>圖 A：bilinear interpolation 作法示意圖。</center>