---
title: Internship Task Review3
tags:
description: Action recognition in Kids using UNITY Pose Augmentation
---
# Internship Task Review 3
## Action recognition in Kids using UNITY Pose Augmentation
<!-- Put the link to this slide here so people can follow -->
### Task done last week
1. Analyse the model on previous Dataset with varied FPS , Varied window size to extract features and Shift Augmentation
2. Increased the Dataset of kids to see the Accuracy change with 3 FPS
3. Moved to custom Object Detector implementaton (SSD: Single-Shot MultiBox Detector)
4. Prepared Dataset for Class Kid reading book and Statrted Training
---
### Task-1
#### Analyse the model on previous Dataset with varied FPS , Varied window size to extract features and Shift Augmentation
**Initial Training Data Details**
| Class | 30 FPS | 15 FPS | 6 FPS |3 FPS|
| -------- | ------ | ------ | ----- | --- |
| **Crawl** | 2079 | 1061 | 424 | 215|
| **Cruising** | 2209 | 1818 | 729| 366|
| **Run** | 2541 | 2190 | 878| 441|
| **Jump** | 2193 | 1102 | 448| 227|
*Note : all the below accuracy is on a custom test data created by 1 test videos of each class*
**Accuracy analysis on Custom Dataset**
| Window size | 30 FPS | 15 FPS | 6 FPS | 3 FPS |
| ----------- | ------ | ------ | ----- | ----- |
| **1 sec** | 78% |75% | 74% | 67% |
| **2 sec** | 80% | 81% | 76% | 71% |
**3 FPS video**
| Window size | Test accuracy |
| ---------------- | -------- |
| **1 sec** (3 Frames) | 67% |
| **2 sec** (6 Frames) | 71% |
| **3 sec** (9 Frames) | 68% |
**Shif augmentation**
| Accuracy | without Shift Augmentation | with shift Augmentation |
| ------------------- | -------------------------- | ----------------------- |
| Validation Accuracy | 97% | 100% |
| Custom test Accuracy| 73% | 77% |
---
#### :bulb: Conclusion
* Accuracy dropped on decreasing FPS of video, major reason could be decrement of Training Data.
* Increasing window size used for extracting features increases accuracy significantly
* But on 3 FPS video on increasing window size from 2 sec to 3 sec accuracy decreases significantly.
* Shift augmentation leads to further improvement in accuracy
---
### Task-2
#### Increased the Dataset of kids to see the Accuracy change
- Downloaded Many Japnese YouTube videos to increased the Kid Dataset
- Trimmed the required part of videos.
- Different videos had different dimension which was affecting th pose estimation badly.
- Masked the videos such that dimension become a factor of input size of openpose 656X368
- Masking helps to get accurate Pose estimation
---

---

**Training Results**
**Window size 1 sec (3 Frames)**

**Window size 2 sec (6 Frames)**

---
#### :bulb: Conclusion
* Pose estimation further become accurate on masking
* Accuracy increased drastically on increasing Training Data.
* Further Increment in Accuracy can be seen by increasing Window size from 2 sec to 3 sec.
---
### Task-3
#### Moved to custom Object Detector implementaton (SSD: Single-Shot MultiBox Detector)
* **Fine tuning the SSD300 model** trained on COCO Dataset with 80 different class and 1 Backround class.
* **Sub sampling or up sample** the pretrained weights available as per your requirement of no. of classes
* **data generator** defining some transformation for pre-processing and data augmentation
##### For now finalize the 3 classes and fine tuning the model
1. kid playing with ball - **ball detection**
2. Kid reading book - book **detection**
3. Kid cutting papaer with scissors - **scissor detection**
##### SSD trained on MS COCO has 80 class among which
* Sports ball (**I'd 33**) and Scissor ( **I'd 77**) already present
* But no Book class is present so for that going with Kite class (similar shape and both made of most of time papers) (**I'd 34**) so its pre-trained weights may help a little bit.
* By default backround class (**I'd 0**)
##### other keypoints about SSD
* SSD512 **has better accuracy** (2.5%) than SSD300 but run at 22 FPS instead of 59 (*SSD300 has input resolution 300X300 while SSD512 has 512X512*)
* SSD performs worse than Faster R-CNN **for small-scale objects**.
* Scissor and ball with which kid play are relatively small
* However scissor class already present in COCO dataset

---
### Task-4
#### Prepared Dataset for Class Kid reading book and Started Training
* First I download images in bulk directly from google using downloader however it took a lot of time to filter out the GOOD images from them.
* For now Downloaded images from Istock Sites for reading book class (*took less to filter out good images as most of the images were good*)
##### **Some training images**

#### Annoted the 1000 images for Book Class and using 750 for training and 150 for Validation
---
### Further work
* Object Detection on all classes accurately.
* End to end frame work to combine both object detection and Action Recognition
* Below is the Rough Pipeline

*Please Suggest further modification or suggestion in above Pipeline*
---
---