--- tags: 生物辨識 --- # Affect Analysis in-the-wild ### Foucs on facial affect analysis ## emotion representation 1. Categorical Affect - Anger, Disgust, Fear, Happiness,Sadness, Surprise and Neutral (7-classes) - Most popular perspective for Facial Expression Recognition 2. Action Units ![](https://i.imgur.com/ybDievS.png) - Facial Action Coding System model - Any facial expression can be represented as a combination of action units(A-E:Intensity scoring, L and R:left and Right) | Emotion | Action units | | ------- | ------------ | Happiness| 6+12 Sadness| 1+4+15 Surprise| 1+2+5B+26 Fear| 1+2+4+5+7+20+26 Anger| 4+5+7+23 Disgust |9+15+17 Contempt| R12A+R14A 3. Dimensional Affect - 2D Valence-Arousal Space - valence : horizontal axis, ranges from very positive to very negative - arousal : vertical axis, ranges from very active to very passive. ![](https://i.imgur.com/257rO7f.png) ## Dataset:Aff-Wild2 - the first and only database annotated in terms of valence and arousal, action units and expressions. - label frame by frame - AUs 1,2,4,6,12,15,20,25 - Evaluation: - action units and expressions : f1-score and acc - valence and arousal : Concordance Correlation Coefficient (CCC) ## Method 1 (dataset author) ![](https://i.imgur.com/OSoYi89.png) - Loss function - action units and expressions : cross entropy - valence and arousal $$ l_{ccc} = 1 - 0.5*\rho_{a}*\rho_{v} $$ ![](https://i.imgur.com/WgguMBi.png) ## Method2 (Overall-2) ### [Multitask Emotion Recognition with Incomplete Labels](https://arxiv.org/pdf/2002.03557.pdf) - Data imbalance problem - merge others datasets - oversample instances by ML-ROS (action unit) - resample instances (valence and arousal, expressions) - Learning from Missing Labels - each data instance contains only a label for one task. - Relabeling procedure - Use model distillation, teacher -> student - Train a single teacher model using only ground truth labels - Use output of the teacher model to replace the missing labels with soft labels. - Use the ground truth and soft labels to train multiple student models. - two-step algorithm ![](https://i.imgur.com/uCV3ZVm.png) - Model Architecture ![](https://i.imgur.com/hIDR5DL.png) - Performance ![](https://i.imgur.com/OPZNAXP.png) ## Method3 (Overall-1) [Two-Stream Aural-Visual Affect Analysis in the Wild](https://arxiv.org/pdf/2002.03399.pdf) - Preprocessing - image - face detection and alignment (3-D array) - use 68-point landmark to generate mask (1-D array) - audio - mel spectrogramm - Pseudo labels - Only 59% of all frames are labeled for categorical expressions and 75% are labeled for valence and arousal - Use the correlation between the two different emotion representations to enrich the training with pseudo labels. - Given an expression label, we sample a valence and arousal label from the distribution of this expression. - Given a valence and arousal label v and a we compute a probability $p_i(v,a)$ for each expression i with $$ p_i(v,a)=\frac{n_i(v,a)}{\sum_{E}n_i(v,a)} $$ where $n_i(v,a)$ is the amount of valence and arousal labels in the corresponding bin of the histogram for expression i and E is the set of all expressions These probabilities are used as soft expression labels during training. ![](https://i.imgur.com/WU6idai.png) - filter contradictory annotations (labeled with the ”happy” expression can simultaneously be annotated with a negative valence.) - Model Architecture ![](https://i.imgur.com/DRreZb2.png) - Performance ![](https://i.imgur.com/OVohKB9.png) ![](https://imgur.com/a/uLFGFud)