論文筆記 Geometric loss functions for camera pose regression with deep learning

# Geometric loss functions for camera pose regression with deep learning A. Kendall and R. Cipolla, *Geometric loss functions for camera pose regression with deep learning*, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5974-5983, 2017. [TOC] ## Abstract > 改進PoseNet的loss function，自動學習loss function裡的權重 PoseNet is a DNN which learns to regress the 6 DoF camera pose from a single image. It was trained using a naive loss function, with hyperparameters which require expensive tuning. We explore **loss functions for learning camera pose which are based on geometry and scene reprojection error**. Additionally we show how to automatically learn an optimal weighting to simultaneously regress position and orientation. By leveraging geometry, we demonstrate that our technique significantly improves PoseNet’s performance. ## Model input: image output: pose → $[p,q]$, where $p$ is position and $q$ is quaternion ### Architecture GoogLeNet, pretrained weights from ImageNet classification removed last softmax for classification → applied 7 Dof FN layer for regression normalize quaternion ### Pose Representation >learning orientation(quaternion) is harder rotation representation: - Euler angle: having multiple values representing the same angle - axis angle: having multiple values representing the same angle - SO(3) matrix: over-parametrised representation - quaternion: two mappings for each rotation, one on each hemisphere ## Loss function learning rotation and translation with different scales: $$ \text{loss}_p=\|p-\hat{p}\|_2, \text{ loss}_q=\|q-\frac{\hat{q}}{\|\hat{q}\|}\|_2 $$ quaternion lies on unit sphere so it needs to be normalized constrain all quaternions to on one hemisphere ### PoseNet loss model which is jointly trained to regress the camera’s position and orientation performs better than separate models trained on each task individually (in the context of PoseNet) $$ \text{loss} = \text{loss}_p + \beta\cdot\text{loss}_q $$ hyperparameter $\beta$ requires significant tuning to get reasonable results ### Learnable loss > Gaussian negative log likelihood: $\frac{n}{2}\log(2\pi)+\frac{n}{2}\log(\sigma^{2})+\frac{1}{2\sigma^{2}}\sum_{i=1}^n (x_i-\mu)^2$ > > learnable uncertainty: $\log\sigma^{2}-\sigma^{-2}\cdot\text{loss}$ formulate by homoscedastic uncertainty which we can learn using probabilistic deep learning: $$ \text{loss} = \text{loss}_p\cdot\sigma_p^{-2} + \log\sigma_p^{2} + \text{loss}_q\cdot\sigma_q^{-2} + \log\sigma_q^{2} $$ Laplace likelihood: larger variances (uncertainty) results in a smaller loss, second term prevents predicting infinite uncertainty (zero loss) learn $s:=\log\sigma^2$ because it is more numerically stable: $$ \text{loss} = \text{loss}_p\cdot\exp(-s_p) + s_p + \text{loss}_q\cdot\exp(-s_q)+ s_q $$ approximate initial guess: $s_p=0.0$, $s_q=-3.0$ > quaternion值比較小(unit vector)變異數也應該比較小