Iterative improvements

# Iterative improvements **Must follow format:** - Observations (with plot(s)) - Hypothesis - Check if improved **See notes from last meeting before beginning** 1 or 2 slides: insights gained from data analysis 1 or 2 slides: insights gained from error analysis of baseline model 1 or 2 slides: progress on data cleaning / preprocessing and impact on model errors 1 or 2 slides: progress on feature extraction & subset selection and impact on model errors 1 slide: block diagram of current best model (same requirements as above) and comparison of train, val and test scores (better than your baseline? improvements for specific situations?) 1 slide: plans / most promising next steps ## Sieben #### Observations - ![](https://i.imgur.com/SPhgJaL.png) - C.1 is mainly confused with C.ZIEN (C.1 is 15% predicted as C.ZIEN, 6% or less for all other classes). C.1 and C.ZIEN are extremely similar hand signs, they both lift a single finger. - Main difference is that the finger is rather static for C.1, but for C.ZIEN it gets moved to the eye and then to the front. - This confusion also happens the other way around, i.e. C.ZIEN's most wrong predicted class is C.1. #### Hypothesis - Incorporating distance feature from eye to finger should help. But: Cannot use hand and body keypoints together, so maybe try with the hand keypoint of the body keypoints. - Incorporating velocity features might help also, but then have to detect horizontal movement. #### Results - Adding index finger to eye distance + the gradient of this, also using averaging, does not resolve the confusion - Feature does not get picked up by Feature Selection - Upon inspection if missclassified features, I noticed that the sign can be also signed using the other hand of course, so adding these for the right hand might help. But not sure if they will be picked up. - Adding the same features for the right hand as well does not lead to any improvement - A lot of the samples of C: 1 are very few frames, and the movement of lifting the hand is virtually not present in the sample, the hand is already present making a 1 shape at the start. #### Second observation - HEBBEN-A gets wrongly predicted as SCHILDPAD-Bhanden for 12% of it's samples. Even though, HEBBEN-A only uses 1 hand, the other hand is kept completely stationary next to the body #### Second Hypothesis - Not enough information about left hand in the selected features - Or: some train samples of SCHILDPAD-Bhanden didn't have the left hand tracked, so the model thinks it doesn't use the left hand. But: SCHILDPAD-Bhanden has really good accurancy (highest of all classes), so this would not make sense. #### Potential Fix - Angles of hands incorporating in features: i.e. For Schildpad, the fingers of the hand are closest to the camera, while for Hebben, the hand is turned sideways - Also: Schildpad should have both hands at relatively the same height. Hebben should not (but not 100% the case) ## Vince #### Iteration attempt 1 - Observations - c.OOK and MOETEN-A are the two worst classified classes - MOETEN does not have that many samples - Both signs bring hands together - In confusion matrix: c.OOK and ZELFDE-A are often confused - ZELFDE-A == 2 times c.OOK! - Possibly wrong labled data?? Looked like a ZELFDE-A was labled as a c.OOK! - In 16% (left hand) and 11% (right hand) of the frames for c.OOK, the hand was unable to be tracked - ![](https://i.imgur.com/9IoE0NF.png) - Might not be the issue because many classes show similar % untracked. But best class (HAAS-oor) has fewest untracked frames - Hypothesis - Interpolating and fixing untracked features might improve the overall score - Check - We have an improvement of the overall score - In the confusion matrix classification of c.OOK from 0.33 -> 0.36 (confusion with ZELFDE-A dropped from 0.3 -> 0.28) - No improvement of confusion of MOETEN-A (why?) - Overall, slight improvements in confusion matrix | Before fixing frames | After fixing frames | | -------- | -------- | | ![](https://i.imgur.com/ENR5AgH.png) | ![](https://i.imgur.com/YRwxZJO.png) | | cross val acc: 0.670 +/- 0.045 | cross val acc: 0.687 +/- 0.046 | Old c.OOK confusion ![](https://i.imgur.com/PcLxhj7.png) New c.OOK confusion ![](https://i.imgur.com/1kvyWhT.png) #### Iteration attempt 2 - Observations: - In many of the sequences the face does not really move a lot. - It might cause confusion in the model or add unnecessary noise. - Hypothesis: - We will get a better score and less features because face data might add unnecessary noise - Check: - We use less features with the face not tracked anymore - Cross validation score is lower but only slightly! - Cross validation score seems to be steeper without face points? - Variance of training score is smaller, might imply more stable data - Unfortunately, the model seems to confuse c.OOK and ZELFDE-A more now. - IDEA: When pronouncing OOK, the mouth takes on an 'O' shape, this could be added as a feature to better classify c.OOK. | Before removing face | After [removing face](https://i.imgur.com/qqfqTZh.jpeg) | | -------- | -------- | | ![](https://i.imgur.com/ENR5AgH.png) | ![](https://i.imgur.com/kHEuJV2.png) | | cross val acc: 0.670 +/- 0.045 | cross val acc: 0.661 +/- 0.046 | Old c.OOK confusion ![](https://i.imgur.com/PcLxhj7.png) New c.OOK confusion ![](https://i.imgur.com/PRjKQA3.png) #### Iteration attempt 3 - Observations: - From previous observations: fixing undetected points yielded in a better score. - Removing face features did not increase the score but did not make it that much worse. c.OOK and ZELFDE-A are confused more though! - Hypothesis: - Fixing the undetected points will increase the overall score - Only keeping mouth area, width and height will help the model confuse c.OOK and ZELFDE-A less. - Check: - No better score was obtained - The model confused c.OOK and ZELFDE-A even more now :'( - I think selectKBest removes the newly added features not really making a difference Old c.OOK confusion ![](https://i.imgur.com/PcLxhj7.png) New c.OOK confusion ![](https://i.imgur.com/unFqPC9.png) ## Kevin changing amount of folds doesn't improve things for ridge regression, we have found optimal hyper params for ridgeregression and optimizing even more doesn't work. never mind, data analyis and error analysis # Iterative improvement 1 Observations: Frame with length of 117, is 'AUTO-RIJDEN-A'. When compared to other 'AUTO-RIJDEN-A' ![](https://i.imgur.com/OjjsbFE.png) Hypothesis We can clearly see that this is somewhat an outlier, we will now 2d plot and compare this with other sequences of 'AUTO-RIJDEN-A', hypothesis: does he wait a lot before doing his sign or is he just slow Results It looks like he is first thinking about how he can doe the gesture, he is scratching his head? Or he is doing the steering with only 1 hand ? short sequence of him doing the movement with 2 hands, hard to tell. Maybe we should delete this from the training set? # Iterative improvement 2 - Observations (with plot(s)) We have some realy short sequences, all with lengths less than 3, we even have some with only 1 frame, we should 2d plot these and see if they even mean anything ![](https://i.imgur.com/l13b9Xu.png) It is not all the same classes that have these very short frames, somewhat spread out - Hypothesis To short frames have a negative impact on our classifier, garbage in, garbage out! TODO: check wich frames are usefull, some 2 frames like C1 are usefull, others less - Check if improved | path | label | frameLength | feedback | | | -------------- | ------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --- | | train_1638.csv | C: 1 | 2 | good enough, 1 is visible but sequence 0442 is clearer | | | train_1468.csv | c.AF | 2 | bad, according to the video the signer should use 2 hands, but only uses 1 and other doesnt realy move, hand he uses is also not tracked only wrist you can make something out of it | | | train_2478.csv | C: 2 | 1 | bad , only 1 frame and should sign 2 but hand is not tracked so you dont see anything usefull | | | train_2722.csv | MOETEN-A | 1 | bad, only 1 frame and wrongly labeled ? doesnt look remotely like moeten-A, I would label this as C1 he just uses his wijsvinger to sing 1 | | | train_1737.csv | c.ZIEN | 2 | good, you can see him doing the eye motion a little bit | | | train_2572.csv | c.OOK | 2 | bad, according to the video the hands should be not near the face , doesn't look like the example video of c.OOK | | | train_0534.csv | NAAR-A | 2 | okay, not ideal, you can see his hand doing the move but only slightly | | | train_0469.csv | AUTO-RIJDEN-A | 2 | okay, not ideal, you can see him using a BIG wheel of a car, but one hand isn't tracked properly | | | train_0442.csv | C: 1 | 2 | good, face isn't tracked fully but you can clearly see hand doing 1 | | | train_0050.csv | HEBBEN-A | 2 | bad, wrong labeled? uses 2 hands for hebben a near the face, but example video uses 1 hand and not near face | | | train_1714.csv | WAT-A | 2 | good, hands are tracked good and can see symbol | | | train_1087.csv | c.OOK | 2 | bad, doesn't look like example video | | | train_1638.csv | C: 1 | 2 | good enough, 1 is visible but sequence 0442 is clearer | | Conclusion: [train_1468.csv , train_2478.csv,train_2722.csv, train_2572.csv train_0050.csv,train_1087.csv], are all very bad labeled/tracked. We should throw them out and see if we get an improvement, however I feel like the impact would be rather small since it are 6 of the 2191 train samples results We did not realy improve, we even went down in score! Some classes however preformed better without the outliers, while others performed worse When we compare the baseline confusion matrix with our newly created matrix and do difference = new_model - baseline_confmatrix We now plot the new confusion matrix to see where improvements were made, green = good for our new model, red= baseline did better we want diagonals to be more green ![](https://i.imgur.com/EYEUgHt.png) I have no idea why SCHILDPAD-Bhanden did so bad, espacially since I didnt remove a training set with SCHILDPAD-Bhanden ! working on it ## Wim ### Iterative improvement 1 ##### Observations ![](https://i.imgur.com/5eZe5k0.png) ZELFDE-A, OOK-A often confused with each other Reason 1: Similar movements, ZELFDE-A is OOK-A repeated in quick succession Reason 2: Current extracted features can't capture a repeated movement, averaging over the 2 parts of the sequence OOK-A: https://vlaamsegebarentaal.be/signbank/dictionary/protected_media/glossvideo/OO/OOK-A-8491.mp4 ZELFDE-A: https://vlaamsegebarentaal.be/signbank/dictionary/protected_media/glossvideo/ZE/ZELFDE-A-14290.mp4 ##### Hypothesis Divididing the sequence into 3 parts instead of 2 should improve the accuracy for these classes ##### Results ### Iterative improvement 2 #### Observations - Camera pose varies per sequence #### Hypothesis ![](https://i.imgur.com/cpOPefa.png) ![](https://i.imgur.com/efcts87.png) #### Results ### Iterative improvement 3 #### Observations - Nose position, arm length vary greatly per signer and per sequence (camera position and distance differ) #### Hypothesis - Normalizing samples should give a better result #### Results # What to do next? - Better selectKBest? - Smarter alternative? Penalize throwing away our own features - Try keeping together x, y and z of keypoints - Look at what it throws away - Look at correlation in face keypoints - New features better? Lower correlation? - Look at face keypoints - Rename face keypoints (more descriptive) -> indices always the same - Better way of separating face and eyebrows - Strange sequences - Sequence with 117 frames is scuffed - Multiple signs in one sequence? - Just throw away? - Sequence of 1 frame where hands are not tracked at all - Body features can still be usefull maybe? - Sequence with 1 frame even mislabeled? - Relabel? - Sequences have different arm lengths etc because of camera position/signer - Good idea might be to normalize the data per sample - Keep fixed distance between shoulder or nose etc. - Data cleaning: Sieben - Preprocessing: Wim - Subset selection: Kevin - Feature extraction: Vince, Sieben Dinsdag samensmijten Woensdag presentatie