# ML Support session 4/11/2021
## Data exploration
- Basic distributions
- Class imbalance
- Weighted?
- Augmentation?
- Amount of frames per sequence
- Varies between 1 - 117
- Interpolation to 8 frames
| All sequences | Per class |
| -------- | -------- |
|  |  |
- Landmarks
- Different z for hands, face, body
- Sometimes untracked
- Interpolation of previous or next frame
Undetected keypoints differ per region, but follow an all or nothing approach
- Body: no undetected keypoints
- 
- 
- 
## Basic classification
- Features
- Body landmarks
- Standardscaler
- KBest feature selection (hyperparam k)
- Logistic regression (hyperparam C)
### Hands
- Using lengths (distance from wrist to fingertips) and angles (Angle between tip of finger and base of finger, angles between 2 different fingers)
- Also include gradients of these features, to include information about the speed they are changing at
- Hand keypoints often undetected. This is a problem (e.g. cannot calculate angles)

- Confusion matrix looks ok-ish.
- Train validate gap is very large however, i.e. a lot of overfitting.
### Body
- Distances (elbow to elbow distance, wrist to wrist, shoulder to wrist distance)
- Angles: spherical angles
- Also includes gradients of these features, to include information about the speed they are changing at
Training accuracy 0.5038256153627597 +/- 0.01384581806067781
Cross-validation accuracy: 0.3408554827570607 +/- 0.018477707177839395

### Physics
- interpolate sequences into 8 frames / interpolate missing values
- using more balanced approach
- Using euclidean distance, 875 features:
Training accuracy 0.520
Cross-validation accuracy: 0.190

- Using speed metrics :
Training accuracy 0.382
Cross-validation accuracy: 0.1710

- Using acceleration metrics :
Training accuracy 0.3858
Cross-validation accuracy: 0.1208

### Face
- Use width, height and area of mouth
- Accuracy better when using 4 fractions instead of 2
- More distributed confusion matrix when using class_weight=balanced

