# Attentional Feature-Pair Relation Networks for Accurate Face Recognition
Aug 17, 2019
difficulty: 3
rating: 3
[paper](https://arxiv.org/abs/1908.06255v1)
New architecture block to cope with non-perfect localization.
Important: this is not self attention! Thisis Bilinear attention.
As I understood the intuition, the proposed bilinear attention aggregates infromation about pairs of spatial locations. So the tesult should be treated as co-occurence as of spatial features in the picute what makes sence.
They also complain about non-perfect localization network and say that the proposed method helps a bit.
## Architecture
The whole pipeline picture looks complicated so we better look closer at details.

### Backbone

For backbone it is OK to use any convolutional network. In the paper they use resnet-101. I see no preference in used backbone, however, no alternatives were considered in the study.
### Reshaping Feature Maps
Adter backbone we have `9x9` feature maps. Rehsape them

### Computing "Similarities"
This is not a similarity in sence that we compute inner product. THe formula for Biilinear Atteniton Map is
$$
\mathcal{A}_{ij} = p^\top \left(
\operatorname{ReLU}\left(
U^\top F_i
\right) \circ
\operatorname{ReLU}\left(
V^\top F_j
\right)
\right)
$$
There are some parameters:
$U\in \mathbb{R}^{D\times L}$, $V\in \mathbb{R}^{D\times L}$, $r\in \mathbb{R}^{L}$.
$U$, $V$ make this operation non symmetric and $\mathcal{A}_{ij}\ne\mathcal{A}_{ji}$
After all $U^\top F_i$ or $V^\top F_j$ are features on a lower dimensional space.
### Pooling and softmax
$p$ is learnable pooling of low dimensional features on dim $L$.
TODO: continue