--- tags: 生物辨識 --- # Improving Face Recognition from Hard Samples via Distribution Distillation Loss 利用`Distillation`提升`hard sample`的成效 > 玉山人臉對於我們的`pre-train model`就是`hard sample` ## Contributions - Our method narrows the performance gap between easy and hard samples on diverse facial variations, which is simple, effective and general. - It is the first work that adopts similarity distribution distillation loss for face recognition. > distribution-driven loss - Significant gains compared to the SotA Arcface are reported. ## Related Work ![](https://i.imgur.com/SG8m9Sc.png) - Conventional KD: 模型壓縮 - Self-Distllation: Self-Ensemble - DDL - only learns one network - proposes a novel cosine similarity distribution-wise constraint. ## Similarity Distribution Estimation 計算 positive pairs 之間的相似度要比 negative pairs 之間的相似度還要小的機率 ![](https://i.imgur.com/BgN63wY.png) Estimate the value $h^+_r$ of the histogram $H^+$ at each bin as: $$ h_r^+=\frac{1}{S^+}\sum_{(i,j):m_{ij=+1}}\delta_{i,j,r} $$ $\delta_{i,j,r}=exp(-\gamma(s_{i,j}-t_r)^2)$ > 如果這個 $s_{i,j}$ 離著節點 $\gamma$ 越近則權重越大 $$ L(X,\theta)=\sum^R_{r=1}(h_r^-\sum^r_{q=1}h^+_q) $$ ## Distribution Distillation Loss $$ L_{KL}=\lambda_1\mathbb{D}_{KL}(P^+\|Q^+)+\lambda_2\mathbb{D}_{KL}(P^-\|Q^-)\\ =\lambda_1\sum_sP^+(s)\log\frac{P^+(s)}{Q^+(s)}+\lambda_2\sum_sP^-(s)\log\frac{P^-(s)}{Q^-(s)} $$ $$ L_{DDL}=\sum^K_{i=1}\mathbb{D}_{KL}(P\|Q_i)-\lambda_3\sum_{i,j \in (p,q_1...q_K)}(\mathbb{E}[S^+_i]-\mathbb{E}[S_j^-]) $$ $$ L=L_{DDL}+L_{Arcface} $$ ## 連結 [PAPER](https://arxiv.org/pdf/2002.03662.pdf) [GITHUB](https://github.com/HuangYG123/DDL) [DDL loss](https://github.com/Tencent/TFace/blob/e06997ac6540a59c0ab1a735eea4d87833a43c83/torchkit/loss/ddl.py) [Histogram Loss](https://zhuanlan.zhihu.com/p/109048940)