---
tags: 生物辨識
---
# Improving Face Recognition from Hard Samples via Distribution Distillation Loss
利用`Distillation`提升`hard sample`的成效
> 玉山人臉對於我們的`pre-train model`就是`hard sample`
## Contributions
- Our method narrows the performance gap between easy and hard samples on diverse facial variations, which is simple, effective and general.
- It is the first work that adopts similarity distribution distillation loss for face recognition.
> distribution-driven loss
- Significant gains compared to the SotA Arcface are reported.
## Related Work

- Conventional KD: 模型壓縮
- Self-Distllation: Self-Ensemble
- DDL
- only learns one network
- proposes a novel cosine similarity distribution-wise constraint.
## Similarity Distribution Estimation
計算 positive pairs 之間的相似度要比 negative pairs 之間的相似度還要小的機率

Estimate the value $h^+_r$ of the histogram $H^+$ at each bin as:
$$
h_r^+=\frac{1}{S^+}\sum_{(i,j):m_{ij=+1}}\delta_{i,j,r}
$$
$\delta_{i,j,r}=exp(-\gamma(s_{i,j}-t_r)^2)$
> 如果這個 $s_{i,j}$ 離著節點 $\gamma$ 越近則權重越大
$$
L(X,\theta)=\sum^R_{r=1}(h_r^-\sum^r_{q=1}h^+_q)
$$
## Distribution Distillation Loss
$$
L_{KL}=\lambda_1\mathbb{D}_{KL}(P^+\|Q^+)+\lambda_2\mathbb{D}_{KL}(P^-\|Q^-)\\
=\lambda_1\sum_sP^+(s)\log\frac{P^+(s)}{Q^+(s)}+\lambda_2\sum_sP^-(s)\log\frac{P^-(s)}{Q^-(s)}
$$
$$
L_{DDL}=\sum^K_{i=1}\mathbb{D}_{KL}(P\|Q_i)-\lambda_3\sum_{i,j \in (p,q_1...q_K)}(\mathbb{E}[S^+_i]-\mathbb{E}[S_j^-])
$$
$$
L=L_{DDL}+L_{Arcface}
$$
## 連結
[PAPER](https://arxiv.org/pdf/2002.03662.pdf)
[GITHUB](https://github.com/HuangYG123/DDL)
[DDL loss](https://github.com/Tencent/TFace/blob/e06997ac6540a59c0ab1a735eea4d87833a43c83/torchkit/loss/ddl.py)
[Histogram Loss](https://zhuanlan.zhihu.com/p/109048940)