# Geometric loss functions for camera pose regression with deep learning
A. Kendall and R. Cipolla, *Geometric loss functions for camera pose regression with deep learning*, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5974-5983, 2017.
[TOC]
## Abstract
> 改進PoseNet的loss function,自動學習loss function裡的權重
PoseNet is a DNN which learns to regress the 6 DoF camera pose from a single image. It was trained using a naive loss function, with hyperparameters which require expensive tuning. We explore **loss functions for learning camera pose which are based on geometry and scene reprojection error**. Additionally we show how to automatically learn an optimal weighting to simultaneously regress position and orientation. By leveraging geometry, we demonstrate that our technique significantly improves PoseNet’s performance.
## Model
input: image
output: pose → $[p,q]$, where $p$ is position and $q$ is quaternion
### Architecture
GoogLeNet, pretrained weights from ImageNet classification
removed last softmax for classification → applied 7 Dof FN layer for regression
normalize quaternion
### Pose Representation
>learning orientation(quaternion) is harder
rotation representation:
- Euler angle: having multiple values representing the same angle
- axis angle: having multiple values representing the same angle
- SO(3) matrix: over-parametrised representation
- quaternion: two mappings for each rotation, one on each hemisphere
## Loss function
learning rotation and translation with different scales:
$$
\text{loss}_p=\|p-\hat{p}\|_2, \text{ loss}_q=\|q-\frac{\hat{q}}{\|\hat{q}\|}\|_2
$$
quaternion lies on unit sphere so it needs to be normalized
constrain all quaternions to on one hemisphere
### PoseNet loss
model which is jointly trained to regress the camera’s position and orientation performs better than separate models trained on each task individually (in the context of PoseNet)
$$
\text{loss} = \text{loss}_p + \beta\cdot\text{loss}_q
$$
hyperparameter $\beta$ requires significant tuning to get reasonable results
### Learnable loss
> Gaussian negative log likelihood: $\frac{n}{2}\log(2\pi)+\frac{n}{2}\log(\sigma^{2})+\frac{1}{2\sigma^{2}}\sum_{i=1}^n (x_i-\mu)^2$
>
> learnable uncertainty: $\log\sigma^{2}-\sigma^{-2}\cdot\text{loss}$
formulate by homoscedastic uncertainty which we can learn using probabilistic deep learning:
$$
\text{loss} = \text{loss}_p\cdot\sigma_p^{-2} + \log\sigma_p^{2} + \text{loss}_q\cdot\sigma_q^{-2} + \log\sigma_q^{2}
$$
Laplace likelihood: larger variances (uncertainty) results in a smaller loss, second term prevents predicting infinite uncertainty (zero loss)
learn $s:=\log\sigma^2$ because it is more numerically stable:
$$
\text{loss} = \text{loss}_p\cdot\exp(-s_p) + s_p + \text{loss}_q\cdot\exp(-s_q)+ s_q
$$
approximate initial guess: $s_p=0.0$, $s_q=-3.0$
> quaternion值比較小(unit vector)變異數也應該比較小