# Tasks: [09-04 to 09-11] ## Custom Object Detector: #### Algorithm: SSD7 #### Loss: Multi-task log loss for classification and smooth L1 loss for localization. #### Output classes: Adult, Kid #### Data Selection and Splits: From each camera of video series - 6000 people are selected from consecutive frames. | Video-series | Selected People | | -------- | -------- | -------- | | totmate_20200811 | 12,000| | totmate_20200812 | 18,000 | | totmate_20200817 | 12,000 | | totmate_20200818 | 18,000 | | totmate_20200819 | 18,000 | | totmate_20200820 | 12,000 | | totmate_20200821 | 18,000 | Metadata files generated by YoloV4 and kid and adult classifier are used as GT for new detector without manual intervention for correcting errors [missing bounding boxes and erroneous bounding boxes]. Training split - 79,760 people Validation split - 34,212 people Total frames - 58,558 #### Data preprocessing - All image data is resized to h=1080, w=1920 to h=300, w=480 frames - All bounding boxes are also scaled down accordingly. #### Data Augmentation (online- during training) - Color based: random-brightness, contrast, hue, saturation - Spatial: flip, translate, scale #### Training 50,000 Iterations: 50 Epochs * 1000 steps ![](https://i.imgur.com/Qwb5G0s.jpg) These are some sample visuzlisation of SSD7. [Green bounding box - YOLOv4 predicted GT] [Red bounding box - SSD7 predicted Kid] [Turquoise bounding box - SSD7 predicted Adult] Result-1: ![](https://i.imgur.com/obUBqrL.jpg) Result-2: ![](https://i.imgur.com/SXpsyjx.jpg) Result-3: ![](https://i.imgur.com/rSnWdOu.jpg) Result-4: ![](https://i.imgur.com/6JPNiAy.jpg) Result-5: ![](https://i.imgur.com/VMqxilW.jpg) Result-6: ![](https://i.imgur.com/oSBFukn.jpg) Result-7: ![](https://i.imgur.com/y4BlRCc.jpg) Result-8: ![](https://i.imgur.com/TD8Uma1.jpg) #### Comments: - Took 14 hrs of training time. - Classification is good, localisation needs improvement - It cannot outperform big models like YoloV4 or SSD512 in terms of accuracy, but can give decent results. #### Inference Speed: ##### Old model- YoloV4 + Kid and Adult classifier(InceptionV3) - YoloV4 takes 50 ms per frame () - Kid and Adult classifier takes 40ms per cropped image i.e, if two people are there in a frame it would take 2 * 40ms = 80ms - Total time to run a frame with 2 people in it: 50ms + 80ms = 130ms - Processing speed is 7 FPS ##### New model - SSD7 - SSD7 takes 23 ms per frame - Processing speed is 43 FPS #### H/W: - CPU: i5-9400F - GPU: RTX-2070Super - RAM: 16 GB ##### Scope for improvement: - Increase training data - Manual Correct Algorithm predicted GT labels