Poster Notes - HackMD

# Poster Notes ## Design Layout (ignore for now) * Overall Layout Suggestion ![](https://i.imgur.com/pit9LF2.png) ## Sections * Preamble: * Introduction * Objective * Features * Specifications * Process: * Data * Model Training * Application * Pose Estimation * Model * Deployment * Outcomes * Experimental Results * [optional] Future Work ## Contents ### Introduction [Yick's Source] https://ps.is.tuebingen.mpg.de/research_fields/seeing-understanding-people; * What is computer vision? * Giving computers the ability to observe and perceive the world? * What we can do with computer vision? * For computers to be full partners with humans, they have to see us and understand our behavior. * They have to recognize our facial expressions, our gestures, our movements and our actions. * This means that we need robust algorithms and expressive representation that can capture human pose, motion, and behavior. * What is Human Pose Estimation? * [picture] Human Pose Estimation recognises both the position and orientation of humans. * Add a skeleton to demonstrate pose estimation (from overleaf). ### Objectives/Specifications * Exploring Sign Language Recognition using only Human Pose Estimation. * Building a proof-of-concept system that recognises Auslan signs in real-time. * Insert diagram of a laptop & webcam. ![](https://i.imgur.com/VU764Lk.png) * Recognises four Auslan emergency signs (moving). * Ambulance, Help, Pain, Hospital. ![](https://i.imgur.com/0yA6eFC.gif) ![](https://i.imgur.com/786CuGt.gif) ![](https://i.imgur.com/89JcHtl.gif) ![](https://i.imgur.com/O7eZzvI.gif) ### Application: * Pose Estimation * How are we using it? * Feature Extraction * Getting keypoints out of an image. * We are using openpose, open source software for human pose estimation. * Flow (diagram) ![](https://i.imgur.com/FXZ7xop.png) * Model: * Problem Formulation: * Sequential Classification Problem * We have as input, a series of frames. * [Optionally] Add a diagram for illustration. * Long Short Term Memory (LSTM) * Given a continuous (in frames) sequences of gestures, we choose to use LSTM model that could recognises a sequence of connected gestures. * It is based on splitting of continuous signs into sub-units and modelling them with neural networks. * Improved version of RNN - Good for sequence classification. * Model Structure [64 LSTM, 0.2 Dropout] -> [128 LSTM, 0.2 Dropout] -> [Softmax, 4] * Deployment * Deployed as a web application. * Show them Lucid chart diagram. * (Principle) Flow: * Clients connect to application with Webcam. * Video sent to server (MSE-IT) to process keypoints (OpenPose). * Keypoints sent to client. * Model sits on the client's browser to deduce sign. * Tech Stack: * WebRTC - Web Real Time Communication. * Python AioRTC, Aiohttp - Web Server Framework. * Tensorflow.js - Browser Side Machine Learning Framework. ## Outcomes * Able a dynamic sign within 1 second delay (given sign time of 1 second). * Hardware Specification Latency - Given 8GB RAM Nvidia Card. * Model Accuracy Plots: ![](https://i.imgur.com/ujb6x7N.png) ![](https://i.imgur.com/81DXb28.png) ================================================= ### tmp ### Auslan/Background * Speaking is an essential part in communication. * Approximately 20,000 Australians rely on Australian Sign Language (Auslan) everyday. * This communication gap between Auslan users and non-Auslan users is worsen during emergencies. ### Objective/Goals * What can we do to provide a solution to recognise Auslan signs that is: * Inexpensive - Saving on Human Resources and Cost. * Efficient - Gurantee that it would work based on statistical evidence. * Reliable - Assurance that solution will work given a set of constraints. * To build a system for proof-of-concept work that human pose estimation is a possible approach towards sign language recognition. * To realise a hardware independent system to perform sign language recognition (without sensors) and can be run on software systems (on the cloud). ### Block Diagram of System ![](https://i.imgur.com/CJ6qbKK.png) ### Project #### Human Pose Estimation - OpenPose * Human Pose Estimation is the ability for computers to infer human body parts from images. * Use OpenPose - Open Source software for Human Pose Estimation developed by Carnegie Mellon Computing Team. * We use pose estimation to convert a video stream of a human into a stream of keypoints (x,y) * Diagram ![](https://i.imgur.com/4kuTMjM.png) #### Data * Faced issue with lack of data, so we recorded videos of ourselves doing signs. * We used video and signal processing techniques to perform **Synthetic Data Generation** - a way to generate more data for our model to train. * Image examples: ![](https://i.imgur.com/iWYwtQY.png) ![](https://i.imgur.com/mY31ZuM.png) #### Model * Model our problem as time series classification * For a time period, we have N frames converted into N keypoints * Sequential time series classification using LSTM - Long Short Term Memory Networks, special RNN * After training, we tuned model by performing hyperparameter tuning. * After training and tuning, we used the model to predict signs given a series of video frames. * ![](https://i.imgur.com/KprxgNz.jpg) #### Application * Once we developed model, we need to tie it into the full application for users to use. * We deployed it as a web application, hosted on a computer hosted by MSE-IT team. * Users are able to log into to the website, and see their signs recognised using their own webcam. * Built on WebRTC, Aiohttp, aiortc and WebSockets. * System Layout ![](https://i.imgur.com/BSSx1xQ.png) ### Results * We are currently able to recognise up to 4 unique Auslan signs in a row. * Model test vs training accuracy. * Dynamic signs takes on average 2 to 3 seconds to be recognised. ### Future Work * Increase number of signs to be recognised. * Having more powerful compute power to predict signs at a faster rate. * Optimize model parameters to have faster inference. * Performing principal component analysis to figure out which features/body joints play a huge role so we can discard he uncessary ones to imporve compute time. * Reach out to the deaf community: * To develop a product that is more suitable for their needs * To collect more data ### References * Definition of Human Pose Estimation: https://www.sciencedirect.com/science/article/pii/S1047320315001121#:~:text=Human%20pose%20estimation%20(HPE)%20is,on%20the%20captured%20body%20joints. * OpenPose: https://arxiv.org/pdf/1812.08008.