--- tags: Human Face --- # Retinaface (2019) RetinaFace: Single-stage Dense Face Localisation in the Wild ## Contribution 1. One stage anchor-based face detector 2. multi-task (face bbox, 5-point landmark, mesh decoder) ## Network architecture ![](https://i.imgur.com/xNhXw15.png) - Backbone: ResNet152 or Mobilenet-0.25 - Neck: FPN - week 10 有講過 - head: Context Module - ssh的Context Module - 使用dcn(deformable convolution network)代替conv ![](https://i.imgur.com/nBXHIvP.png) - Anchor setting (Input Image 640x640x3, scale step $2^{\frac{1}{3}}$, based pixel 16, aspect ratio 1:1) | Feature Pyamid | Stride | Anchor | |:-------------------------------:|:------:|:-------------------:| | $P_{2} (160\times160\times256)$ | 4 | 16, 20.16, 25.40 | | $P_{3} (80\times80\times256)$ | 8 | 32, 40.32, 50.80 | | $P_{4} (40\times40\times256)$ | 16 | 64, 80.63, 101.59 | | $P_{5} (20\times20\times256)$ | 32 | 128, 161.26, 203.19 | | $P_{6} (10\times10\times256)$ | 64 | 256, 322.54, 406.37 | - output & loss: ![](https://i.imgur.com/h5Ahcg8.png) - face classification (softmax loss for binary classes) - face bbox (smooth-l1-loss) - 5-point face landmark (smooth-l1-loss) - dense regression (mesh decoder and Differentiable Renderer) ![](https://i.imgur.com/dVOFWau.png) - Mesh Decoder: - 目的:將向量decode至3D的人臉 - 將$P_{ST}\in\mathbb{R}^{128}$利用4層的GCN decode至每個人臉像素的3D位置($D_{P_{ST}}\in\mathbb{R}^{n\times6}$, 利用n個6維的vector畫制3D的人臉, 6個value分別代表x,y,z,r,g,b) - Differentiable Renderer: - 目的:將向量3D的人臉投影成2D以計算算loss - 將3D的人臉($D_{P_{ST}}$)投影至2D的人臉與GT去計算loss,其中此Renderer需要用到$P_{ill}\in\mathbb{R}^{7}$(光照參數)和$P_{cam}\in\mathbb{R}^{9}$(相機參數) - [github code](https://github.com/google/tf_mesh_renderer) - loss: $$ L_{pixel}=\frac{1}{W*H}\sum_{i}^{W}\sum_{j}^{H}||R(D_{P_{ST}},P_{ill},P_{cam})_{i,j}-I(i,j)|| $$ - Total Loss $$ L_{total}=L_{cls}+0.25L_{box}+0.1L_{pts}+0.01L_{pixel} $$ ## Training Strategy - OHEM - Data Augmentation - random crop - flip - positive and negative anchor setting - IOU(gt, anchor) > 0.5 -> positive anchor - IOU(gt, anchor) < 0.3 -> negative anchor - others -> ignores ## Experiment - Ablation experiments of the proposed methods ![](https://i.imgur.com/fdLueYq.png) - Influence of face detection and alignment on deep face recognition ![](https://i.imgur.com/9M2w7Se.png)