Tuan Report - HackMD

# Tuan Report ## Clothes detection: - Model: Yolov3 Darknet-53 (https://github.com/pjreddie/darknet) - Dataset: DeepFashion2 (https://github.com/switchablenorms/DeepFashion2) - Test Data: DeepFashion2 Validate Data: 32156 images [gdrive](https://drive.google.com/open?id=1O45YqhREBOoLudjA06HcTehcEebR0o9y) - Test Result: mean average precision (mAP@0.50) = 84.20 % ``` Loading weights from ../darknet/backup/yolov3_cloth_130000.weights... seen 64 Done! Loaded 107 layers from weights-file calculation mAP (mean average precision)... 32156 detections_count = 175221, unique_truth_count = 52490 class_id = 0, name = short_sleeved_shirt, ap = 95.99% (TP = 11230, FP = 1318) class_id = 1, name = long_sleeved_shirt, ap = 88.73% (TP = 5251, FP = 2342) class_id = 2, name = short_sleeved_outwear, ap = 62.14% (TP = 80, FP = 43) class_id = 3, name = long_sleeved_outwear, ap = 89.05% (TP = 1838, FP = 956) class_id = 4, name = vest, ap = 87.25% (TP = 1695, FP = 410) class_id = 5, name = sling, ap = 62.80% (TP = 239, FP = 213) class_id = 6, name = shorts, ap = 94.20% (TP = 3570, FP = 350) class_id = 7, name = trousers, ap = 96.08% (TP = 8735, FP = 1026) class_id = 8, name = skirt, ap = 93.49% (TP = 5795, FP = 1097) class_id = 9, name = short_sleeved_dress, ap = 84.53% (TP = 2426, FP = 889) class_id = 10, name = long_sleeved_dress, ap = 68.34% (TP = 1088, FP = 1192) class_id = 11, name = vest_dress, ap = 88.95% (TP = 3138, FP = 2683) class_id = 12, name = sling_dress, ap = 83.06% (TP = 1037, FP = 1256) for conf_thresh = 0.25, precision = 0.77, recall = 0.88, F1-score = 0.82 for conf_thresh = 0.25, TP = 46122, FP = 13775, FN = 6368, average IoU = 67.45 % IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.842010, or 84.20 % Total Detection Time: 674.000000 Seconds ``` - Result: https://bitbucket.org/nldanang/attribute-analysis/src/master/clothes_detector/ ## Facial expression: - Model: [Resnet_50](https://drive.google.com/uc?id=17unekscjX6pExycRcA1VD0-hVpT6e354) follow paper [Fine-Grained Facial Expression Analysis Using Dimensional Emotion Model](https://arxiv.org/pdf/1805.01024.pdf) - Dataset: 28,709 images Facial Emotion Recognition on FER2013 (https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data) - Test Data: 574 images in Facial Emotion Recognition on FER2013 (https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data) - Test Result: Using 80% of 28,709 images to train and 20% of FER2013 to test, Accuracy on Test : 84 % - Result: https://bitbucket.org/nldanang/attribute-analysis/src/master/emotion_detector/ ## Improve Anti proofing for face recognition - Staging Model: Trained on [NUAA](http://parnec.nuaa.edu.cn/xtan/data/NUAAImposterDB.html) dataset using MobileNet - New Model: * Face De-Spoofing: Anti-Spoofing via Noise Modeling [paper](https://arxiv.org/abs/1807.09968), [Trained model](https://github.com/yaojieliu/ECCV2018-FaceDeSpoofing/tree/master/lib) - Why New Model is better: * It trained with bigger data * CNN network allow to estimate spoof noise from image * We can use multiple Anti proofing model to improve accuracy of Anti proofing - Dataset: [Oulu-NPU](https://sites.google.com/site/oulunpudatabase/), [CASIA-MFSD](http://biometrics.cse.msu.edu/Publications/Databases/MSUMobileFaceSpoofing/index.htm) and [Replay-Attack](https://www.idiap.ch/dataset/replayattack) - Test Data: [Oulu-NPU](https://sites.google.com/site/oulunpudatabase/), [CASIA-MFSD](http://biometrics.cse.msu.edu/Publications/Databases/MSUMobileFaceSpoofing/index.htm) and [Replay-Attack](https://www.idiap.ch/dataset/replayattack) - Test Result: Evaluation metrics to compare with previous methods, They used Attack Presentation Classification Error Rate (APCER), Bona Fide Presentation Classification Error Rate (BPCER) and, ACER = (APCER + BPCER)/2 for the intra testing on Oulu-NPU, and Half Total Error Rate (HTER), half of the summation of FAR and FRR, for the cross testing between CASIA-MFSD and Replay-Attack. The paper show result: ![](https://i.imgur.com/NPxyHsj.png) ![](https://i.imgur.com/8usTclM.png) - Result: + overview solutions: https://hackmd.io/Nmf1GqKpR7OeXlPLcCEDgQ + Demo face anti spoofing for face recognition: https://drive.google.com/file/d/1UcDB__DmtdW1b2WaSd4qpXz6JJ1K7jqR/view?usp=drivesdk + I will merged code in Jinjer Face or create new repoitory for this project if need anti-spoofing. ## Convert age and gender model to TensorFlow integration with TensorRT(TF-TRT) - Improve performance From 3.3 FPS(frames per second) to 3.6 (FPS) - Results: https://gitlab.com/heyml/neolab/demo_face_recognition/tree/jetson_dev_tftrt ## Improve face recognition (InProgress): - Staging Model: tf-insightface pretrained [model](https://drive.google.com/open?id=1Iw2Ckz_BnHZUi78USlaFreZXylJj7hnP) - New Model: Official InsightFace model: LResNet100E-IR (https://www.dropbox.com/s/tj96fsm6t6rq8ye/model-r100-arcface-ms1m-refine-v2.zip?dl=0) - Why New Model is better: * LResNet100E-IR is official pretrained model of InsightFace however, convert this to TensorRT is hard * LResNet100E-IR network trained on MS1M-Arcface dataset with ArcFace loss is SOTA of Face Recognition (https://github.com/deepinsight/insightface) * We can use multiple face recognition model by ensemble multiple embedding feature vector from multiple pretrained model to give more detail of face feature. The [paper](https://dl.acm.org/citation.cfm?id=3302459) show that: Experiments have proved that the ensemble CNNs classifier is better than the single CNNs classifier. - Dataset: MS1M-Arcface - Result: + overview solutions: https://hackmd.io/2ZUuxqRyQcirupsD5iPGcw + I finished to convert LResNet100E-IR model to Tensorflow model and I am trying to convert it to Tensorflow-TensorRT to improve perfomance on Jetson Nano.