# (7/19)Computer Vision recent Paper :FaceNet ###### tags: `paper` [toc] --- ## Before Meeting :::success ### Author: - Florian Schroff - ![](https://i.imgur.com/yqAuTRd.png) - [refer](https://scholar.google.com/citations?user=eWbZJlMAAAAJ&hl=zh-TW) - Dmitry Kalenichenko - ![](https://i.imgur.com/CT6x3LO.png) - [refer]() - James Philbin - ![](https://i.imgur.com/oX7AUH1.png) - [refer](https://scholar.google.com/citations?user=80JxPpUAAAAJ&hl=en) ::: [refer]() [refer]() [refer]() --- ## Recent Paper --- ### FaceNet: A Unified Embedding for Face Recognition and Clustering :::success #### Abstracion - FaceNet - directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity - deep convolutional network - only 128-bytes per face. - ![](https://i.imgur.com/Yahqmwp.png) ::: :::info #### Detail - Introduction - face verification (is this the same person) - recognition (who is this person) - clustering (find common people among these faces). - face verification simply involves thresholding the distance between the two embeddings - FaceNet directly trains its output to be a compact 128-D embedding using a tripletbased loss function based on LMNN - consist of two matching face thumbnails and a non-matching face thumbnail and the loss aims to separate the positive pair from the negative by a distance margin - online negative exemplar mining strategy - Method - end-to-end learning of the whole system - triplet loss - Triplet Loss - ![](https://i.imgur.com/kUNQ6HK.png) - ![](https://i.imgur.com/AZPebKx.png) - ![](https://i.imgur.com/aaUH4CE.png) - Triplet Selection - Deep Convolutional Networks - Stochastic Gradient Descent (SGD) with standard backprop [8, 11] and AdaGrad - parameters - FLOPS - non-linear activation function - The first category: adds 1×1×d convolutional layers - The second category we use is based on GoogLeNet style Inception models - ![](https://i.imgur.com/tzZQXR8.png) - Datasets and Evaluation - ![](https://i.imgur.com/WaQLoYJ.png) - ![](https://i.imgur.com/NAocbMI.png) - ![](https://i.imgur.com/VBq3iaF.png) - Hold-out Test Set - Personal Photos - Academic Datasets - Labeled Faces in the Wild (LFW - Youtube Faces DB ::: :::warning #### Conclusion - Experiments - Computation Accuracy Trade-off - ![](https://i.imgur.com/RgCIO4C.png) - Computation Accuracy Trade-off - ![](https://i.imgur.com/Y1c1afN.png) - ![](https://i.imgur.com/UAc8rly.png) - Embedding Dimensionality - each face is compactly represented by a 128 dimensional byte vector, which is ideal for large scale clustering and recognition - Amount of Training Data - the effect may be even larger on larger models - ![](https://i.imgur.com/FvM2Szj.png) - Effect of CNN Model - Overall, in the final performance the top models of both architectures perform comparably. However, some of our Inception based models, such as NN3, still achieve good performance while signif - Sensitivity to Image Quality - ![](https://i.imgur.com/O3MTmvw.png) - Performance on Youtube Faces DB - We use the average similarity of all pairs of the first one hundred frames that our face detector detects in each video.This gives us a classification accuracy of 95.12%±0.39. - We use the average similarity of all pairs of the first one hundred frames that our face detector detects in each video.This gives us a classification accuracy of 95.12%±0.39. - Performance on LFW - We evaluate our model on LFW using the standard protocol for unrestricted, labeled outside data. - Our model is evaluated in two modes: - Fixed center crop of the LFW provided thumbnail. - A proprietary face detector (similar to Picasa) is run on the provided LFW thumbnails. If it fails to align the face (this happens for two images), the LFW alignment is used - Face Clustering - Our compact embedding lends itself to be used in order to cluster a users personal photos into groups of people with the same identity. - ![](https://i.imgur.com/HJr2oJR.png) - We provide a method to directly learn an embedding into an Euclidean space for face verification - only requires minimal alignment (tight crop around the face area) ::: [refer]() --- ### :::success #### Abstracion ::: :::info #### Detail ::: :::warning #### Conclusion ::: [refer]() --- ### :::success #### Abstracion ::: :::info #### Detail ::: :::warning #### Conclusion ::: [refer]() ---### :::success #### Abstracion ::: :::info #### Detail ::: :::warning #### Conclusion ::: [refer]() ---### :::success #### Abstracion ::: :::info #### Detail ::: :::warning #### Conclusion ::: [refer]() ---### :::success #### Abstracion ::: :::info #### Detail ::: :::warning #### Conclusion ::: [refer]() ---### :::success #### Abstracion ::: :::info #### Detail ::: :::warning #### Conclusion ::: [refer]() ---