# Improved Data Analysis Data analysis: properties of the data that may impact your model design decisions (and making the right conclusions from your findings) ## X Y Z ![](https://i.imgur.com/nz34TZY.png) ![](https://i.imgur.com/uNqF9DJ.png) ![](https://i.imgur.com/AbYCbHR.png) ## Balanced/Unbalanced data set ![](https://i.imgur.com/quJjHLq.png) ## Frames ![](https://i.imgur.com/7NOAJnZ.png) ![](https://i.imgur.com/scziO9e.png) ![](https://i.imgur.com/D8oDM3k.png) ## Signers ![](https://i.imgur.com/2Q6h4az.png) ![](https://i.imgur.com/Sk8Va7O.png) Dominant hand per signer | Signer | Hand | |--------|----------| | 46 | right | | 34 | left (?) | | 44 | right | | 47 | right | | 38 | left | | 10 | right | | 45 | right | | 55 | right | | 56 | right | | 3 | right | | 68 | right | | 36 | left | | 32 | right | | 59 | left | | 54 | right | | 8 | right | | 52 | right | | 35 | right | | 40 | left (?) | | 62 | right | | 24 | right | | 48 | right | | 33 | right | | 60 | right | | 26 | right | | 18 | left | | 29 | right | | 61 | left | | 51 | right | | 37 | right | | 67 | right | | 63 | right | | 43 | right | | 25 | right | | 27 | left | ## Measurement errors ![](https://i.imgur.com/uwR1xkv.png) ## Possible issues with i.i.d. assumption - Left Right handed people -> Want similar distribution left right handed people in each fold - Different Seating Positions ## Shoulder rotation ![](https://i.imgur.com/cU6Tnjm.png) ## Shoulder to shoulder length How much variance do we see in the shoulder to shoulder length? Per class ![](https://i.imgur.com/2NWMLzZ.png) We notice quite a lot of variation in shoulder to shoulder distances. We know that different signers have signed and not everybody is the same size (duh). Per signer ![](https://i.imgur.com/p58Z7Gj.png) We also notice that there is a lot of variation in shoulder to shoulder lenght per signer. This can be explained by different camera positions. A camera further away might give a smaller shoulder to shoulder distance than a camera closer for the same person. ## Detect classes very similar Scatter plot wrist positions different classes/signers ## Plot ideas Do PCA, K most important directions, make scatter plot, what classes can be best seperated ## Face features data analysis Correlations of mouth indices | Correlation x | Correlation y | Correlation z | | -------- | -------- | -------- | | ![](https://i.imgur.com/mAsfKaS.png) | ![](https://i.imgur.com/S9358pC.png) | ![](https://i.imgur.com/MDE9qZ8.png) | Correlation of eyebrow indices | Correlation x | Correlation y | Correlation z | | -------- | -------- | -------- | | ![](https://i.imgur.com/0pl4LB0.png) | ![](https://i.imgur.com/l3k0tqQ.png) | ![](https://i.imgur.com/LklmZJU.png) | Eyebrows just the same and highly correlated over two parts of the indices (left and right eyebrow probably) # Undetected keypoints Total undetected keypoints 46314 out of 340125 About 13.6168% Label: AUTO-RIJDEN-A Total undetected keypoints 57282 out of 656500 About 8.7254% Label: C: 1 Total undetected keypoints 9555 out of 77000 About 12.4091% Label: C: 2 Total undetected keypoints 12525 out of 136625 About 9.1674% Label: GOED-A Total undetected keypoints 5844 out of 82000 About 7.1268% Label: HAAS-oor Total undetected keypoints 24015 out of 368125 About 6.5236% Label: HEBBEN-A Total undetected keypoints 19005 out of 205750 About 9.2369% Label: MOETEN-A Total undetected keypoints 5028 out of 62750 About 8.0127% Label: NAAR-A Total undetected keypoints 22698 out of 311875 About 7.2779% Label: SCHILDPAD-Bhanden Total undetected keypoints 40299 out of 322375 About 12.5007% Label: WAT-A Total undetected keypoints 12762 out of 145875 About 8.7486% Label: ZELFDE-A Total undetected keypoints 11979 out of 290625 About 4.1218% Label: c.AF Total undetected keypoints 47577 out of 356875 About 13.3316% Label: c.OOK Total undetected keypoints 8802 out of 179125 About 4.9139% Label: c.ZIEN Total undetected keypoints 14181 out of 170375 About 8.3234%