# Capstone Meeting #9 -- 31/4/2020 ## Agenda * Check GRA before submitting; [Done] * Each person update on previous week's work (briefly!) * Add/Update our WBS for AusLan * Go in depth on insights gain from sign language recognition research * Discuss about our engagement with AusLan community? (Not priority for now?) ### Research Updates #### Yong Yick * Researced on LSTM: Long short term memory * LSTM is a specific implementation of RNN that * Introductory Explanation: [Explanation](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) * OpenSource and can be built on Pose Estimation * Will get a list of papers so that we can refer to it. * Look at other research topics that may be similar. * Managed to narrow down to a few research papers that have sufficiently large dataset for AusLan. * Need for annotated datasets. #### Tsz Kiu * [Possible Dataset](https://github.com/Signbank/Auslan-signbank/tree/master/signbank) * Learnt alphabets for Auslan. * Learnt that fingerspelling is quite dynamic in movements as well. * Looking at Auslan translation based on computer vision. * Steps that they took: * (Segmentation) - only face; only hand; or both?; * (Feature Extraction) - * Used to train the model * Further process such as angle of joints; etc... * (Gesture Recognition) - * Inputting keypoints + features extracted into our model to predict sign + word. * Segementation to get the necessary body parts. * We need to narrow down a list of features that we want to extract that would be useful for our case. * Gesture Recognition research --> Hidden Markov Model. * [Implementing Hidden Markov Model](http://htk.eng.cam.ac.uk/) * HTK consists of a set of library modules and tools available in C source form. The tools provide sophisticated facilities for speech analysis, HMM training, testing and results analysis. * Problem with segementation/feature extraction without Pose Estimation Keypoints: * Only able to detect the locations of joints * Can't work with dynamic signs and movements ### Phases in our Project #### Phase 1 -- Static Gestures * Static Alphabets / Numbers * Sign Language for one word and static gesture (Use Australian Wide version) * Set a dictionary a set number of words/alphabets * Collecting training datasets; * Annotated images or videos * Possibly re-process raw videos before feeding into models * Possibly look into asking UniMelb department to get Sign language datasets. * Look into implementing with different models: * CNN * LSTM * Can look into image classification algorithms * etc... * Collecting test datasets and testing. #### Phase 2 -- Moving Gestures * Temporal Alphabets (J and H) * Detecing words with temporal gestures * Workflow similar to phase 1 #### Phase 3 -- Series of Gestures * Detecting a sequence of alphabets to form a word * Detecting sequence of words * (POSSIBLE WAY): Taking a series of words and using another AI/BOT to rearrange words in a grammatically correct way * Workflow similar to phase 1 ### Concerns * Gesture Recognition seems to be quite a challenging task due to time constraints. * Grammar can be an issue when we focus only on finger spelling. ### Moving forward #### Administrative / Assignments * [Yick] Submit our GRA by 1st May -- Yick to Submit * [Yick] Project brief - auslan (not urgent) - Latex; * [Yick] template for final project; * [Matthew] Working with IP Camera Stream * Assignment 2? #### Capstone * [Yick] set up and run different models (primarily LSTM); * (team) let's set up and run different models (in terms of generalization, accuracy, ease of use); * (Team) Implicitly, continuing researching the techniques/rabbit hole (also as self-development) * (Team) let's find out different models (not urgent); * Liaise with AusLan communities to get datasets (in a few weeks) * [Matthew] - Unimelb subject/departments * [yick] - auslan, other universities; * Start colllecting possible datasets from sources. (in two weeks) * Using web scrapping libraries * Manually downloading for personal testing; #### Good-to-do * approach lecturers (jingge, erik, jonathan, Iman Shames);