# Methods section
## Split into
* Human Pose Estimation with OpenPose
* Data Collection And Processing
* Model Development
* Application Deployment
### Human Pose Estimation with OpenPose
* What is Human Pose Estimation?
* Extract estimated human key-points from image input of humans.
* What did we use?
* OpenPose - open source human pose estimation software
* Capable of running multi person pose detection in real-time (footnote given appropriate hardware specs).
* Developed by researchers at CMU, made in C++ and Caffe framework.
* What features does OpenPose extract?
* OpenPose is extracts out face, hand, pose and foot keypoints (using separate models)
* As numerical coordinates
* How did we use OpenPose output for feature extraction?
* Take (x,y) coordinates from following key-point map.
* (Show image of OpenPose hand and pose output)
* Normalized them from -1 to 1 where (-1,-1) = top left, (1,1) = bottom right.
### Data Collection and Processing
* How did we collect data?
* Struggled to find data from online sources.
* Resorted to recording 400 raw videos of ourselves.
* Approximately 100 raw videos per sign using a 30 FPS camera
* Stored it on an online virtual machine server
* How did we organise data?
* Sorted videos into folders, where folder name is sign label
* Within that directory, we labeled it as sign_n where sign is the sign label and n is the numerical ID for that data.
* Formatting and processing data from OpenPose
* We took in x,y coordinates from OpenPose as mentioned in previous section
* Removed confidence levels for data
* X data = Flatten (x,y) into a single array, has n data points from n frames.
* Y data = 1 label for these data points
* Increasing data variability with
* Goal - Create more data from our current data set as training.
* Referred to paper (link), creating new training data with transformations/augmentations
* Video processing
* Variable video speeds (0.8 and 1.5)
* Extract frames as before and put them into text files.
* Data Augmentation
* First Augmentation - using classical image affline transformations BEFORE openpose.
* Rotation, Flip, Shear ...
* Second Augmentation - keypoint value changes AFTER openpose
* Using additive gaussian noise
## Model Development
* Problem formulation
* Time Series Classification
* Given N Features, classify class
* Why did we choose RNNs?
* Good at sharing states between time data.
* Suffers from Vanishing/Exploding gradient problem.
* Why LSTMs?
* Solve the Vanishing/Exploding gradient problem using gates
* Forget Gate, Select Gate, Update Gate
* Knows how to choose to forget, select and update between states to avoid gradients vanishing due to multiplication of <1 number
* Model Architecture
* We propose the following architecture in our system
* LSTM (36) -> Dropout(0.2) -> LSTM (36) -> Dropout(0.2) -> Dense(Softmax) -> 4 scores for 4 classes
* How did we improve model?
* Hyperparameter optimisation
* Running variations in model from a search space
* Getting model with optimal performance on test set.
* Knoweldge Distillation
* Can we achieve similar results with a smaller model?
* Teacher student distillation
* Sub-frame sampling
* We know humans can only sign as fast as 0.5 HZ
* Tried reducing N frames to N/2 frames,
* Using N/2 frames for one data point in training and the other N/2 frames for another data point.
## Application Deployment
* Our focus this year was to deploy our model as an application for people to test gesture recognition from their browser.
* Web application work flow
* Users connect to camera on browser
* Camera works with WebRTC to transmit frames to back-end server for processing
* Processing runs OpenPose, and spits out keypoints
* Keypoints returned to users via socketio
* Our model sits on user's browsers, and performs relevant classification for gesture
## Results and Analysis
* Performing K-Fold cross validation to see average model performance
* Confusion Matrix
* General Error, how many signs successfully classified?
* Specific error, how many of that class's signs were successfully recognised?