# Improved Data Analysis
Data analysis: properties of the data that may impact your model design decisions (and making the right conclusions from your findings)
## X Y Z



## Balanced/Unbalanced data set

## Frames



## Signers


Dominant hand per signer
| Signer | Hand |
|--------|----------|
| 46 | right |
| 34 | left (?) |
| 44 | right |
| 47 | right |
| 38 | left |
| 10 | right |
| 45 | right |
| 55 | right |
| 56 | right |
| 3 | right |
| 68 | right |
| 36 | left |
| 32 | right |
| 59 | left |
| 54 | right |
| 8 | right |
| 52 | right |
| 35 | right |
| 40 | left (?) |
| 62 | right |
| 24 | right |
| 48 | right |
| 33 | right |
| 60 | right |
| 26 | right |
| 18 | left |
| 29 | right |
| 61 | left |
| 51 | right |
| 37 | right |
| 67 | right |
| 63 | right |
| 43 | right |
| 25 | right |
| 27 | left |
## Measurement errors

## Possible issues with i.i.d. assumption
- Left Right handed people -> Want similar distribution left right handed people in each fold
- Different Seating Positions
## Shoulder rotation

## Shoulder to shoulder length
How much variance do we see in the shoulder to shoulder length?
Per class

We notice quite a lot of variation in shoulder to shoulder distances. We know that different signers have signed and not everybody is the same size (duh).
Per signer

We also notice that there is a lot of variation in shoulder to shoulder lenght per signer. This can be explained by different camera positions. A camera further away might give a smaller shoulder to shoulder distance than a camera closer for the same person.
## Detect classes very similar
Scatter plot wrist positions different classes/signers
## Plot ideas
Do PCA, K most important directions, make scatter plot, what classes can be best seperated
## Face features data analysis
Correlations of mouth indices
| Correlation x | Correlation y | Correlation z |
| -------- | -------- | -------- |
|  |  |  |
Correlation of eyebrow indices
| Correlation x | Correlation y | Correlation z |
| -------- | -------- | -------- |
|  |  | 
|
Eyebrows just the same and highly correlated over two parts of the indices (left and right eyebrow probably)
# Undetected keypoints
Total undetected keypoints 46314 out of 340125
About 13.6168%
Label: AUTO-RIJDEN-A
Total undetected keypoints 57282 out of 656500
About 8.7254%
Label: C: 1
Total undetected keypoints 9555 out of 77000
About 12.4091%
Label: C: 2
Total undetected keypoints 12525 out of 136625
About 9.1674%
Label: GOED-A
Total undetected keypoints 5844 out of 82000
About 7.1268%
Label: HAAS-oor
Total undetected keypoints 24015 out of 368125
About 6.5236%
Label: HEBBEN-A
Total undetected keypoints 19005 out of 205750
About 9.2369%
Label: MOETEN-A
Total undetected keypoints 5028 out of 62750
About 8.0127%
Label: NAAR-A
Total undetected keypoints 22698 out of 311875
About 7.2779%
Label: SCHILDPAD-Bhanden
Total undetected keypoints 40299 out of 322375
About 12.5007%
Label: WAT-A
Total undetected keypoints 12762 out of 145875
About 8.7486%
Label: ZELFDE-A
Total undetected keypoints 11979 out of 290625
About 4.1218%
Label: c.AF
Total undetected keypoints 47577 out of 356875
About 13.3316%
Label: c.OOK
Total undetected keypoints 8802 out of 179125
About 4.9139%
Label: c.ZIEN
Total undetected keypoints 14181 out of 170375
About 8.3234%