# Face mask paper 191 - Good evening everyone and Welcome to our presentation on the Single-stage real-time face mask detection - My name is ... Along with me there are Dr. Bogdan Trawinski, Dr Nga Ly Tu, Mr Anh and Ms Vi. - The content of this presentation includes 5 sections: - This presentation would last about 15 minutes so if you have any questions, feel free to ask me by the end of it. ## I. Introduction ### 1. Current situation - After almost 3 years of living with Covid-19, we are learning to adapt to a world with this disease. Although many precautions have been taken to prevent the spread of the virus, wearing facial mask is still crucial under many circumstances. - Looking at further points in which the pandemic is no longer a concern, facial masks would still be applied at workplaces that require hygienic environment such as restaurants or hospitals or that require employees' protection such as chemical exposure or laboratories environment. ### 2. Related works - the paper addressed the works surrounding the stages within the networks, the dataset, and the use cases of the face mask detector. - for the model architecture, facemask detectors are classified into single-staged and two-staged models. - Two-stage: localize the face (regression problem), then classify this face into the defined classes (classification problem) - slow inference time - the training of face mask detector usually focused only on optimizing the classification of the detected face, rather than the - Single-stage: the two stages are considered as 1 big object detection problem, where the model learns to recognize the 3 types of objects: faces with masks, faces without masks, and incorrectly worn facemasks - problem: limited in number of face mask detectors - Other works: datasets of this task and the current approach of implementing such detector ### 3. Scope - Proposed a single-staged real-time face mask detector based on YOLOv5 framework - Updated the current dataset to train the face mask detector  - Proposed a method to implement the face mask detector over the network ## II. Proposed method ### 1. Anchor boxes - update the anchor boxes from 9 to 12 anchor boxes to utilize the small boxes to capture more faces per screen - use kmeans clustering to define the width and the height of the boxes ### 2. Model Follow the Yolov5 framework where we use the Ghost convolution module - Purple: feature maps after Ghost convolution layer - Yellow: feature maps after Cross Stage Partial Ghost block - Blue: feature maps after Upsampling block - Red: feature maps of Spatial Pyramid Pooling layer: deep level information aggregation Ghost convolution: linear, cheap operations to create a more enriched result with less computation complexity -> 1x1 kernel CNN Ghost block/ghost module: - 1 convolution layer to downsize feature map by 2 -> y_1 - y_1->ghost conv -> y_2 - output: the concatenation y_1 and y_2 (y_1, y_2) -> CSP Ghost: to enhance the learning capability and to reduce the computation of residual blocks we added an extra detection layer to YOLO head to better capture the variety of anchor boxes’ sizes ## III. Implementation ### 1. Dataset modification: Properly Wearing Masked Face Detection Dataset - **Before - 9205 images - imbalance: - incorrectly-worn masked faces: 320/16720 ~ 1.9% - **Process: - crawled 240 images - data augmentation - retina face -> detect face. then classify the faces manually - **After: - 10556 imgs in total - incorrectly worn masked faces increased to 9.4% of the total face ### 2. Web Application - Flask framework. - assumed to have one server and multiple clients - Server: - triggered when running requirements are met - receive direct stream from IP cameras (processed by OpenCV) - Emit frames using socketio - Clients: - UI + notification methods when more than 1 person does not follow the protocol. ## IV. Results Imaging results: trém!!! ### 1. Evaluation metrics: - Evaluate our models using Average precision of threshold 50, 75 and mean average precision - eg: with threshold 50, IOU > 0.5 is considered to be correct ### 2. Evaluation results: #### Original PWMFD: - 100 epochs: achieved adequate results of 97.5% for AP_50 and 86.9% for AP_75. - 200 epochs, our model attained the highest result of 97.6% AP50 and 88.4% AP75 while still maintaining reasonable inference time. - lighter than YOLOv3 and YOLOv4 - **QA: why 100 & 200.** - on loss monitoring: model seems to converge at 100 epochs so 100 epochs point -> checkpoint - 200 epochs is to see whether the accuracy increases #### Modified PWMFD: - all models showed a reduction in mAP - had added a number of diverse sizes of faces to the training set - -> localize smaller images that were not labeled in the validation set - -> update validation set in the future