---
tags: 生物辨識
---
# Affect Analysis in-the-wild
### Foucs on facial affect analysis
## emotion representation
1. Categorical Affect
- Anger, Disgust, Fear, Happiness,Sadness, Surprise and Neutral (7-classes)
- Most popular perspective for Facial Expression Recognition
2. Action Units

- Facial Action Coding System model
- Any facial expression can be represented as a combination of action units(A-E:Intensity scoring, L and R:left and Right)
| Emotion | Action units |
| ------- | ------------ |
Happiness| 6+12
Sadness| 1+4+15
Surprise| 1+2+5B+26
Fear| 1+2+4+5+7+20+26
Anger| 4+5+7+23
Disgust |9+15+17
Contempt| R12A+R14A
3. Dimensional Affect
- 2D Valence-Arousal Space
- valence : horizontal axis, ranges from very positive to very negative
- arousal : vertical axis, ranges from very active to very passive.

## Dataset:Aff-Wild2
- the first and only database annotated in terms of valence and arousal, action units and expressions.
- label frame by frame
- AUs 1,2,4,6,12,15,20,25
- Evaluation:
- action units and expressions : f1-score and acc
- valence and arousal : Concordance Correlation Coefficient (CCC)
## Method 1 (dataset author)

- Loss function
- action units and expressions : cross entropy
- valence and arousal
$$
l_{ccc} = 1 - 0.5*\rho_{a}*\rho_{v}
$$

## Method2 (Overall-2)
### [Multitask Emotion Recognition with Incomplete Labels](https://arxiv.org/pdf/2002.03557.pdf)
- Data imbalance problem
- merge others datasets
- oversample instances by ML-ROS (action unit)
- resample instances (valence and arousal, expressions)
- Learning from Missing Labels
- each data instance contains only a label for one task.
- Relabeling procedure
- Use model distillation, teacher -> student
- Train a single teacher model using only ground truth labels
- Use output of the teacher model to replace the missing labels with soft labels.
- Use the ground truth and soft labels to train multiple student models.
- two-step algorithm

- Model Architecture

- Performance

## Method3 (Overall-1)
[Two-Stream Aural-Visual Affect Analysis in the Wild](https://arxiv.org/pdf/2002.03399.pdf)
- Preprocessing
- image
- face detection and alignment (3-D array)
- use 68-point landmark to generate mask (1-D array)
- audio
- mel spectrogramm
- Pseudo labels
- Only 59% of all frames are labeled for categorical expressions and 75% are labeled for valence and arousal
- Use the correlation between the two different emotion representations to enrich the training with pseudo labels.
- Given an expression label, we sample a valence and arousal label from the distribution of this expression.
- Given a valence and arousal label v and a we compute a probability $p_i(v,a)$ for each expression i with
$$
p_i(v,a)=\frac{n_i(v,a)}{\sum_{E}n_i(v,a)}
$$
where $n_i(v,a)$ is the amount of valence and arousal labels in the corresponding bin of the histogram for expression i and E is the set of all expressions These probabilities are used as soft expression labels during training.

- filter contradictory annotations (labeled with the ”happy” expression can simultaneously be annotated with a negative valence.)
- Model Architecture

- Performance

