# Face mask paper 191
- Good evening everyone and Welcome to our presentation on the Single-stage real-time face mask detection
- My name is ... Along with me there are Dr. Bogdan Trawinski, Dr Nga Ly Tu, Mr Anh and Ms Vi.
- The content of this presentation includes 5 sections:
- This presentation would last about 15 minutes so if you have any questions, feel free to ask me by the end of it.
## I. Introduction
### 1. Current situation
- After almost 3 years of living with Covid-19, we are learning to adapt to a world with this disease. Although many precautions have been taken to prevent the spread of the virus, wearing facial mask is still crucial under many circumstances.
- Looking at further points in which the pandemic is no longer a concern, facial masks would still be applied at workplaces that require hygienic environment such as restaurants or hospitals or that require employees' protection such as chemical exposure or laboratories environment.
### 2. Related works
- the paper addressed the works surrounding the stages within the networks, the dataset, and the use cases of the face mask detector.
- for the model architecture, facemask detectors are classified into single-staged and two-staged models.
- Two-stage: localize the face (regression problem), then classify this face into the defined classes (classification problem)
- slow inference time
- the training of face mask detector usually focused only on optimizing the classification of the detected face, rather than the
- Single-stage: the two stages are considered as 1 big object detection problem, where the model learns to recognize the 3 types of objects: faces with masks, faces without masks, and incorrectly worn facemasks
- problem: limited in number of face mask detectors
- Other works: datasets of this task and the current approach of implementing such detector
### 3. Scope
- Proposed a single-staged real-time face mask detector based on YOLOv5 framework
- Updated the current dataset to train the face mask detector
- Proposed a method to implement the face mask detector over the network
## II. Proposed method
### 1. Anchor boxes
- update the anchor boxes from 9 to 12 anchor boxes to utilize the small boxes to capture more faces per screen
- use kmeans clustering to define the width and the height of the boxes
### 2. Model
Follow the Yolov5 framework where we use the Ghost convolution module
- Purple: feature maps after Ghost convolution layer
- Yellow: feature maps after Cross Stage Partial Ghost block
- Blue: feature maps after Upsampling block
- Red: feature maps of Spatial Pyramid Pooling layer: deep level information aggregation
Ghost convolution: linear, cheap operations to create a
more enriched result with less computation complexity -> 1x1 kernel CNN
Ghost block/ghost module:
- 1 convolution layer to downsize feature map by 2 -> y_1
- y_1->ghost conv -> y_2
- output: the concatenation y_1 and y_2
(y_1, y_2) -> CSP Ghost: to enhance the learning capability and to reduce the computation of residual blocks
we added an extra detection layer to YOLO head to better capture the variety of anchor boxes’ sizes
## III. Implementation
### 1. Dataset modification:
Properly Wearing Masked Face Detection Dataset
- **Before
- 9205 images
- imbalance:
- incorrectly-worn masked faces: 320/16720 ~ 1.9%
- **Process:
- crawled 240 images
- data augmentation
- retina face -> detect face. then classify the faces manually
- **After:
- 10556 imgs in total
- incorrectly worn masked faces increased to 9.4% of the total face
### 2. Web Application
- Flask framework.
- assumed to have one server and multiple clients
- Server:
- triggered when running requirements are met
- receive direct stream from IP cameras (processed by OpenCV)
- Emit frames using socketio
- Clients:
- UI + notification methods when more than 1 person does not follow the protocol.
## IV. Results
Imaging results: trém!!!
### 1. Evaluation metrics:
- Evaluate our models using Average precision of threshold 50, 75 and mean average precision
- eg: with threshold 50, IOU > 0.5 is considered to be correct
### 2. Evaluation results:
#### Original PWMFD:
- 100 epochs: achieved adequate results of 97.5% for AP_50 and 86.9% for AP_75.
- 200 epochs, our model attained the highest result of 97.6% AP50 and 88.4% AP75 while still maintaining reasonable inference time.
- lighter than YOLOv3 and YOLOv4
- **QA: why 100 & 200.**
- on loss monitoring: model seems to converge at 100 epochs so 100 epochs point -> checkpoint
- 200 epochs is to see whether the accuracy increases
#### Modified PWMFD:
- all models showed a reduction in mAP
- had added a number of diverse sizes of faces to the training set
- -> localize smaller images that were not labeled in the validation set
- -> update validation set in the future