# MSP-Lab SER Hackathon (Fighting!!!)
> [name=David, Lucas, Seong-Gyun, Luz, Ali, Abinay, Jeri]
>
> [time=Sat, Jun 18, 2022 12:19 PM]
# [Github](https://github.com/sgleem/SS_for_SER/tree/development)
# [UTD BOX](https://utdallas.box.com/s/epmdxo2fm8qorntezrbmz9tr5sulnqzn)
# [Results Table](https://docs.google.com/spreadsheets/d/1yuQ106ZgJ2dZ7TEpuq9F16O9tTOsO4m2cm1F6vKF1nY/edit?usp=sharing)
# [Overleaf](https://www.overleaf.com/7187996787gtxmgpjnsmfk)
# Experiments to Run
* USC-IEMOCAP:
- David's Rule/Multi-label/ALL_secondary/Hard-label/All_partitions
- Plurality rule/All experiments
* CREMA-D
- Majority voting/Hard-label&Soft-label&Distribution-label/ALL_primary/Parition5
* MSP-IMPROV
- Majority voting/Soft-label/four_primary/Partition4
- Plurality rule/Hard-label&Soft-label&Distribution-label/four_primary/Partition4
- Plurality rule/Soft-label/ALL_secondary/Partition6
# Computational Resources
## Taiwan Computational Cloud (80 GPUs)
<!---
### Machine Empty
- [x] 1 Crema-D
- [x] 6 Improv2
- [x] 8 nnime1
- [x] 9 nnime2
- [x] 3 Crema-D
- [ ] 2 Crema-D
- [ ] 4 Crema-D
- [ ] 5 Improv1
- [ ] 7 Improv3
- [ ] 10 podcast1
- [ ] 11 podcast2
- [ ] 12 podcast3
- [ ] 13 podcast4
--->
<!---
### On-going processes (Each one machin uses 2 GPUs) (-port)
- [x] 1. run_podcast10_w2v_lg_rob_seed0_M_primary_4emo_distlbl (`nohup bash run_podcast10_w2v_lg_rob_seed0_M_primary_4emo_distlbl.sh > run_podcast10_w2v_lg_rob_seed0_M_primary_4emo_distlbl.out &`)
- [x] 2. run_podcast10_w2v_lg_rob_seed0_M_primary_ALLemo_distlbl (`nohup bash run_podcast10_w2v_lg_rob_seed0_M_primary_ALLemo_distlbl.sh > run_podcast10_w2v_lg_rob_seed0_M_primary_ALLemo_distlbl.out &`)
- [x] 3. run_podcast10_w2v_lg_rob_seed0_M_secondary_4emo_distlbl (`nohup bash run_podcast10_w2v_lg_rob_seed0_M_secondary_4emo_distlbl.sh > run_podcast10_w2v_lg_rob_seed0_M_secondary_4emo_distlbl.out &`)
- [x] 4. run_podcast10_w2v_lg_rob_seed0_P_primary_4emo_distlbl (`nohup bash run_podcast10_w2v_lg_rob_seed0_P_primary_4emo_distlbl.sh > run_podcast10_w2v_lg_rob_seed0_P_primary_4emo_distlbl.out &`)
- [x] 5. run_podcast10_w2v_lg_rob_seed0_P_primary_ALLemo_distlbl (`nohup bash run_podcast10_w2v_lg_rob_seed0_P_primary_ALLemo_distlbl.sh > run_podcast10_w2v_lg_rob_seed0_P_primary_ALLemo_distlbl.out &`)
- [x] 6. run_podcast10_w2v_lg_rob_seed0_P_secondary_4emo_distlbl (`nohup bash run_podcast10_w2v_lg_rob_seed0_P_secondary_4emo_distlbl.sh > run_podcast10_w2v_lg_rob_seed0_P_secondary_4emo_distlbl.out &`)
- [x] 7. run_podcast10_w2v_lg_rob_seed0_P_secondary_ALLemo_distlbl (`nohup bash run_podcast10_w2v_lg_rob_seed0_P_secondary_ALLemo_distlbl.sh > run_podcast10_w2v_lg_rob_seed0_P_secondary_ALLemo_distlbl.out &`)
- [X] 1. run_cremad_1_4emo_w2v_lg_rob (`nohup bash cremad_run_1_4emo_w2v_lg_rob.sh > machine1_cremad_run_cremad_1_4emo_w2v_lg_rob.out &`)
- [x] 2. run_cremad_2_4emo_w2v_lg_rob (`nohup bash cremad_run_2_4emo_w2v_lg_rob.sh > machine2_cremad_run_cremad_2_4emo_w2v_lg_rob.out &`)
- [x] 3. run_cremad_3_ALLemo_w2v_lg_rob (`nohup bash cremad_run_3_ALLemo_w2v_lg_rob.sh > machine3_cremad_run_cremad_3_ALLemo_w2v_lg_rob.out &`)
- [x] 4. run_cremad_4_ALLemo_w2v_lg_rob (`nohup bash cremad_run_4_ALLemo_w2v_lg_rob.sh > machine4_cremad_run_cremad_4_ALLemo_w2v_lg_rob.out &`)
------------------------------------------
- [x] 5. run_files_podcast1_4_primary_w2v_lg_rob (`nohup bash podcast_run_1_4_primary_w2v_lg_rob.sh > machine5_podcast_run_1_4_primary_w2v_lg_rob.out &`)
- [x] 6. run_files_podcast2_all_primary_w2v_lg_rob (`nohup bash podcast_run_2_all_primary_w2v_lg_rob.sh > machine6_podcast_run_2_all_primary_w2v_lg_rob.out &`)
- [x] 7. run_files_podcast3_4_secondary_w2v_lg_rob (`nohup bash podcast_run_3_4_secondary_w2v_lg_rob.sh > machine7_podcast_run_3_4_secondary_w2v_lg_rob.out &`)
- [x] 8. run_files_podcast4_all_secondary_w2v_lg_rob (`nohup bash podcast_run_4_all_secondary_w2v_lg_rob.sh > machine8_podcast_run_4_all_secondary_w2v_lg_rob.out &`)
------------------------------------------
- [x] 9. run_improv_1_4emo_w2v_lg_rob_primary (`nohup bash improv_run_1_4emo_w2v_lg_rob_primary.sh > machine9_improv_run_1_4emo_w2v_lg_rob_primary.out &`)
- [x] 10. run_improv_2_4emo_w2v_lg_rob_secondary (`nohup bash improv_run_1_4emo_w2v_lg_rob_secondary.sh > machine10_improv_run_1_4emo_w2v_lg_rob_secondary.out &`)
- [x] 11. run_improv_3_ALLemo_w2v_lg_rob_secondary (`nohup bash improv_run_1_ALLemo_w2v_lg_rob_secondary.sh > machine11_improv_run_1_ALLemo_w2v_lg_rob_secondary.out &`)
------------------------------------------
- [x] 12. run_nnime_1_D_w2v_lg_rob_4emo_part1 (`nohup bash nnime_run_1_D_w2v_lg_rob_4emo_part1.sh > machine12_nnime_run_1_D_w2v_lg_rob_4emo_part1.out &`)
- [x] 13. run_nnime_2_D_w2v_lg_rob_ALLemo_part1 (`nohup bash nnime_run_2_D_w2v_lg_rob_ALLemo_part1.sh > machine13_nnime_run_2_D_w2v_lg_rob_ALLemo_part1.out &`)
- [x] 14. run_cremad_1_4emo_w2v_lg_rob_part2 (`nohup bash cremad_run_1_4emo_w2v_lg_rob_part2.sh > machine14_cremad_run_1_4emo_w2v_lg_rob_part2.out &`)
- [x] 15. run_cremad_2_4emo_w2v_lg_rob_part2 (`nohup bash cremad_run_2_4emo_w2v_lg_rob_part2.sh > machine15_cremad_run_2_4emo_w2v_lg_rob_part2.out &`)
- [x] 16. run_cremad_3_ALLemo_w2v_lg_rob_part2 (`nohup bash cremad_run_3_ALLemo_w2v_lg_rob_part2.sh > machine16_cremad_run_3_ALLemo_w2v_lg_rob_part2.out &`)
- [x] 17. run_cremad_4_ALLemo_w2v_lg_rob_part2 (`nohup bash cremad_run_4_ALLemo_w2v_lg_rob_part2.sh > machine17_cremad_run_4_ALLemo_w2v_lg_rob_part2.out &`)
------------------------------------------
- [x] 18. run_iemocap_1_D_w2v_lg_rob_part2 (`nohup bash iemocap_run_1_D_w2v_lg_rob_part2.sh > machine18_iemocap_run_1_D_w2v_lg_rob_part2.out &`)
- [x] 19. run_iemocap_1_M_w2v_lg_rob_part2 (`nohup bash iemocap_run_1_M_w2v_lg_rob_part2.sh > machine19_iemocap_run_1_M_w2v_lg_rob_part2.out &`)
- [x] 20. run_iemocap_2_D_w2v_lg_rob_part2 (`nohup bash iemocap_run_2_D_w2v_lg_rob_part2.sh > machine20_iemocap_run_2_D_w2v_lg_rob_part2.out &`)
- [x] 21. run_iemocap_2_M_w2v_lg_rob_part2 (`nohup bash iemocap_run_2_M_w2v_lg_rob_part2.sh > machine21_iemocap_run_2_M_w2v_lg_rob_part2.out &`)
------------------------------------------
- [x] 22. run_improv_1_4emo_w2v_lg_rob_primary_part2 (`nohup bash improv_run_1_4emo_w2v_lg_rob_primary_part2.sh > machine22_improv_run_1_4emo_w2v_lg_rob_primary_part2.out &`)
- [x] 23. run_improv_2_4emo_w2v_lg_rob_secondary_part2 (`nohup bash improv_run_1_4emo_w2v_lg_rob_secondary_part2.sh > machine23_improv_run_1_4emo_w2v_lg_rob_secondary_part2.out &`)
- [x] 24. run_improv_2_ALLemo_w2v_lg_rob_secondary_part2 (`nohup bash improv_run_1_ALLemo_w2v_lg_rob_secondary_part2.sh > machine24_improv_run_1_ALLemo_w2v_lg_rob_secondary_part2.out &`)
------------------------------------------
- [x] 25. run_nnime_1_D_w2v_lg_rob_4emo_part2 (`nohup bash nnime_run_1_D_w2v_lg_rob_4emo_part2.sh > machine25_nnime_run_1_D_w2v_lg_rob_4emo_part2.out &`)
- [x] 26. run_nnime_2_D_w2v_lg_rob_ALLemo_part2 (`nohup bash nnime_run_2_D_w2v_lg_rob_ALLemo_part2.sh > machine26_nnime_run_2_D_w2v_lg_rob_ALLemo_part2.out &`)
- [x] 27. run_cremad_1_4emo_w2v_lg_rob_part3 (`nohup bash cremad_run_1_4emo_w2v_lg_rob_part3.sh > machine27_cremad_run_1_4emo_w2v_lg_rob_part3.out &`)
- [x] 28. run_cremad_2_4emo_w2v_lg_rob_part3 (`nohup bash cremad_run_2_4emo_w2v_lg_rob_part3.sh > machine28_cremad_run_2_4emo_w2v_lg_rob_part3.out &`)
- [x] 29. run_cremad_3_ALLemo_w2v_lg_rob_part3 (`nohup bash cremad_run_3_ALLemo_w2v_lg_rob_part3.sh > machine29_cremad_run_3_ALLemo_w2v_lg_rob_part3.out &`)
- [x] 30. run_cremad_4_ALLemo_w2v_lg_rob_part3 (`nohup bash cremad_run_4_ALLemo_w2v_lg_rob_part3.sh > machine30_cremad_run_4_ALLemo_w2v_lg_rob_part3.out &`)
------------------------------------------
- [x] 31. run_iemocap_1_D_w2v_lg_rob_part3 (`nohup bash iemocap_run_1_D_w2v_lg_rob_part3.sh > machine31_iemocap_run_1_D_w2v_lg_rob_part3.out &`)
- [x] 32. run_iemocap_1_M_w2v_lg_rob_part3 (`nohup bash iemocap_run_1_M_w2v_lg_rob_part3.sh > machine32_iemocap_run_1_M_w2v_lg_rob_part3.out &`)
- [x] 33. run_iemocap_2_D_w2v_lg_rob_part3 (`nohup bash iemocap_run_2_D_w2v_lg_rob_part3.sh > machine33_iemocap_run_2_D_w2v_lg_rob_part3.out &`)
- [x] 34. run_iemocap_2_M_w2v_lg_rob_part3 (`nohup bash iemocap_run_2_M_w2v_lg_rob_part3.sh > machine34_iemocap_run_2_M_w2v_lg_rob_part3.out &`)
------------------------------------------
- [x] 35. run_improv_1_4emo_w2v_lg_rob_primary_part3 (`nohup bash improv_run_1_4emo_w2v_lg_rob_primary_part3.sh > machine35_improv_run_1_4emo_w2v_lg_rob_primary_part3.out &`)
- [x] 36. run_improv_2_4emo_w2v_lg_rob_secondary_part3 (`nohup bash improv_run_1_4emo_w2v_lg_rob_secondary_part3.sh > machine36_improv_run_1_4emo_w2v_lg_rob_secondary_part3.out &`)
- [x] 37. run_improv_2_ALLemo_w2v_lg_rob_secondary_part3 (`nohup bash improv_run_1_ALLemo_w2v_lg_rob_secondary_part3.sh > machine37_improv_run_1_ALLemo_w2v_lg_rob_secondary_part3.out &`)
------------------------------------------
- [x] 38. run_nnime_1_D_w2v_lg_rob_4emo_part3 (`nohup bash nnime_run_1_D_w2v_lg_rob_4emo_part3.sh > machine38_nnime_run_1_D_w2v_lg_rob_4emo_part3.out &`)
- [x] 39. run_nnime_2_D_w2v_lg_rob_ALLemo_part3 (`nohup bash nnime_run_2_D_w2v_lg_rob_ALLemo_part3.sh > machine39_nnime_run_2_D_w2v_lg_rob_ALLemo_part3.out &`)
- [x] 40. run_cremad_1_4emo_w2v_lg_rob_part4 (`nohup bash cremad_run_1_4emo_w2v_lg_rob_part4.sh > machine40_cremad_run_1_4emo_w2v_lg_rob_part4.out &`)
- [x] 41. run_cremad_2_4emo_w2v_lg_rob_part4 (`nohup bash cremad_run_2_4emo_w2v_lg_rob_part4.sh > machine41_cremad_run_2_4emo_w2v_lg_rob_part4.out &`)
- [x] 42. run_cremad_3_ALLemo_w2v_lg_rob_part4 (`nohup bash cremad_run_3_ALLemo_w2v_lg_rob_part4.sh > machine42_cremad_run_3_ALLemo_w2v_lg_rob_part4.out &`)
- [x] 43. run_cremad_4_ALLemo_w2v_lg_rob_part4 (`nohup bash cremad_run_4_ALLemo_w2v_lg_rob_part4.sh > machine43_cremad_run_4_ALLemo_w2v_lg_rob_part4.out &`)
------------------------------------------
- [x] 44. run_iemocap_1_D_w2v_lg_rob_part4 (`nohup bash iemocap_run_1_D_w2v_lg_rob_part4.sh > machine44_iemocap_run_1_D_w2v_lg_rob_part4.out &`)
- [x] 45. run_iemocap_1_M_w2v_lg_rob_part4 (`nohup bash iemocap_run_1_M_w2v_lg_rob_part4.sh > machine45_iemocap_run_1_M_w2v_lg_rob_part4.out &`)
- [x] 46. run_iemocap_2_D_w2v_lg_rob_part4 (`nohup bash iemocap_run_2_D_w2v_lg_rob_part4.sh > machine46_iemocap_run_2_D_w2v_lg_rob_part4.out &`)
- [x] 47. run_iemocap_2_M_w2v_lg_rob_part4 (`nohup bash iemocap_run_2_M_w2v_lg_rob_part4.sh > machine47_iemocap_run_2_M_w2v_lg_rob_part4.out &`)
------------------------------------------
- [x] 48. run_improv_1_4emo_w2v_lg_rob_primary_part4 (`nohup bash improv_run_1_4emo_w2v_lg_rob_primary_part4.sh > machine48_improv_run_1_4emo_w2v_lg_rob_primary_part4.out &`)
- [x] 49. run_improv_2_4emo_w2v_lg_rob_secondary_part4 (`nohup bash improv_run_1_4emo_w2v_lg_rob_secondary_part4.sh > machine49_improv_run_1_4emo_w2v_lg_rob_secondary_part4.out &`)
- [x] 50. run_improv_2_ALLemo_w2v_lg_rob_secondary_part4 (`nohup bash improv_run_1_ALLemo_w2v_lg_rob_secondary_part4.sh > machine50_improv_run_1_ALLemo_w2v_lg_rob_secondary_part4.out &`)
------------------------------------------
- [x] 51. run_nnime_1_D_w2v_lg_rob_4emo_part4 (`nohup bash nnime_run_1_D_w2v_lg_rob_4emo_part4.sh > machine51_nnime_run_1_D_w2v_lg_rob_4emo_part4.out &`)
- [x] 52. run_nnime_2_D_w2v_lg_rob_ALLemo_part4 (`nohup bash nnime_run_2_D_w2v_lg_rob_ALLemo_part4.sh > machine52_nnime_run_2_D_w2v_lg_rob_ALLemo_part4.out &`)
- [x] 53. run_cremad_1_4emo_w2v_lg_rob_part5 (`nohup bash cremad_run_1_4emo_w2v_lg_rob_part5.sh > machine53_cremad_run_1_4emo_w2v_lg_rob_part5.out &`)
- [x] 54. run_cremad_2_4emo_w2v_lg_rob_part5 (`nohup bash cremad_run_2_4emo_w2v_lg_rob_part5.sh > machine54_cremad_run_2_4emo_w2v_lg_rob_part5.out &`)
- [x] 55. run_cremad_3_ALLemo_w2v_lg_rob_part5 (`nohup bash cremad_run_3_ALLemo_w2v_lg_rob_part5.sh > machine55_cremad_run_3_ALLemo_w2v_lg_rob_part5.out &`)
- [x] 56. run_cremad_4_ALLemo_w2v_lg_rob_part5 (`nohup bash cremad_run_4_ALLemo_w2v_lg_rob_part5.sh > machine56_cremad_run_4_ALLemo_w2v_lg_rob_part5.out &`)
------------------------------------------
- [x] 57. run_iemocap_1_D_w2v_lg_rob_part5 (`nohup bash iemocap_run_1_D_w2v_lg_rob_part5.sh > machine57_iemocap_run_1_D_w2v_lg_rob_part5.out &`)
- [x] 58. run_iemocap_1_M_w2v_lg_rob_part5 (`nohup bash iemocap_run_1_M_w2v_lg_rob_part5.sh > machine58_iemocap_run_1_M_w2v_lg_rob_part5.out &`)
- [x] 59. run_iemocap_2_D_w2v_lg_rob_part5 (`nohup bash iemocap_run_2_D_w2v_lg_rob_part5.sh > machine59_iemocap_run_2_D_w2v_lg_rob_part5.out &`)
- [x] 60. run_iemocap_2_M_w2v_lg_rob_part5 (`nohup bash iemocap_run_2_M_w2v_lg_rob_part5.sh > machine60_iemocap_run_2_M_w2v_lg_rob_part5.out &`)
------------------------------------------
- [x] 61. run_improv_1_4emo_w2v_lg_rob_primary_part5 (`nohup bash improv_run_1_4emo_w2v_lg_rob_primary_part5.sh > machine61_improv_run_1_4emo_w2v_lg_rob_primary_part5.out &`)
- [x] 62. run_improv_2_4emo_w2v_lg_rob_secondary_part5 (`nohup bash improv_run_1_4emo_w2v_lg_rob_secondary_part5.sh > machine62_improv_run_1_4emo_w2v_lg_rob_secondary_part5.out &`)
- [x] 63. run_improv_2_ALLemo_w2v_lg_rob_secondary_part5 (`nohup bash improv_run_1_ALLemo_w2v_lg_rob_secondary_part5.sh > machine63_improv_run_1_ALLemo_w2v_lg_rob_secondary_part5.out &`)
------------------------------------------
- [x] 64. run_nnime_1_D_w2v_lg_rob_4emo_part5 (`nohup bash nnime_run_1_D_w2v_lg_rob_4emo_part5.sh > machine64_nnime_run_1_D_w2v_lg_rob_4emo_part5.out &`)
- [x] 65. run_nnime_2_D_w2v_lg_rob_ALLemo_part5 (`nohup bash nnime_run_2_D_w2v_lg_rob_ALLemo_part5.sh > machine65_nnime_run_2_D_w2v_lg_rob_ALLemo_part5.out &`)
- [x] 66. run_improv_1_4emo_w2v_lg_rob_primary_part6 (`nohup bash improv_run_1_4emo_w2v_lg_rob_primary_part6.sh > machine66_improv_run_1_4emo_w2v_lg_rob_primary_part6.out &`)
- [x] 67. run_improv_2_4emo_w2v_lg_rob_secondary_part6 (`nohup bash improv_run_1_4emo_w2v_lg_rob_secondary_part6.sh > machine62_improv_run_1_4emo_w2v_lg_rob_secondary_part6.out &`)
- [x] 68. run_improv_2_ALLemo_w2v_lg_rob_secondary_part6 (`nohup bash improv_run_1_ALLemo_w2v_lg_rob_secondary_part6.sh > machine63_improv_run_1_ALLemo_w2v_lg_rob_secondary_part6.out &`)
- [ ] 29. run_cremad_3_ALLemo_w2v_lg_rob_part3 (`nohup bash cremad_run_3_ALLemo_w2v_lg_rob_part3.sh > machine29_cremad_run_3_ALLemo_w2v_lg_rob_part3.out &`)
- [ ] 42. run_cremad_3_ALLemo_w2v_lg_rob_part4 (`nohup bash cremad_run_3_ALLemo_w2v_lg_rob_part4.sh > machine42_cremad_run_3_ALLemo_w2v_lg_rob_part4.out &`)
--->
------------------------------------------
# To-do list
- [x] Code
- [ ] Clean code
- [x] Learn how to use gitghub (git)
- [ ] Learn how to use Docker
- [ ] Modify code for categorical emotion classifiation tasks
- [ ] Pre-trained model
- [ ] Know the differences between pre-trained models
- [ ] SER Model
- [ ] Replace pooling layer with Winston's chunk-level mechanism

- [ ] How to save models
- [ ] Experiments
- [ ] List all accessablie computation resoures
* Each pre-trained model needs different sizes of GPU memory
- [ ] List all experiments
* Where to save trained models weights
*
- [ ] Tasks
- [ ] Is it possilbe to adopt the models for the continuous-level emotion recognition?
- [ ] Objective function design
* How to define the loss functions for models to learn distribution-labels
- [ ] Penalize the predictions if the predictions have emotions which the ground truth haven't
- [ ] Kullback–Leibler divergence can make models learn well when the ground truth have specific emotions, but no penality on predictions which have emotions which the ground truth haven't
- [x] Dataset usage
* Make sure each dataset is well-set
- [x] Format
- [x] Audios
- [x] Labels
------------------------------------------
# Goal: [AAAI 2023](https://aaai.org/Conferences/AAAI-23/)
* **August 1, 2022: Submit to Dr. Busso**
* **August 8, 2022: Abstracts due at 11:59 PM UTC-12**
* **August 15, 2022: Full papers due at 11:59 PM UTC-12**
# [Opponent product](https://twitter.com/audeering/status/1539973140570198016?s=21&t=UIM5jm1_VBtEFr33HaWsaQ)
------------------------------------------
# Outlines
> [TOC]
------------------------------------------
# Progress
## [Install HuggingFace](https://huggingface.co/docs/transformers/installation)
## [Re-implement Code of Dawn of the transformer era in SER](https://cometmail-my.sharepoint.com/:u:/g/personal/hdc210001_utdallas_edu/EdwIMZKJ3dpDgssPm24EHnYB_fhG1ymKG5S5XOJV44VqXA)
## Experiments (Audios/Labels)
* MSP-PODCAST v1.7
- [x] Attributes **by Seong-Gyun**
- [ ] Categories
* MSP-PODCAST v1.10
- [x] Attributes **by Seong-Gyun**
- [x] Categories
* USC-IEMOCAP
- [x] Attributes
- [x] Categories
- [ ] Cross-corpus (MSP-Podcast) vs. within corpus performance **by Lucas**
* MSP-IMPROV
- [x] Attributes **by Lucas**
- [x] Categories
- [ ] Cross-corpus (MSP-Podcast) vs. within corpus performance **by Lucas**
* CREMA-D
- [x] Categories **Only**
* (Optional) NTHU-NNIME (In Chinese)
* Modify model code for two different tasks
- [x] Attributes **by Seong-Gyun**
- [ ] Categories **by Seong-Gyun**
* Provide labels of all databases
- [x] Attributes **by David**
- [ ] Categories **by David**
------------------------------------------
# Research Questions
## Tow tasks
- [ ] Efficiency vs Performance
- [ ] Does chunk based model affect performance?
- [ ] Model size vs Speed
- [ ] Is it better to combine with openSMILE LLDs features?
- [ ] How does normalization affect performance of SER using the pre-trained models as upstream?
- [ ] Normalization by each utterance
- [ ] Normalization by training set
- [ ] Normalization by speaker
- [ ] Does curriculum learning improve performance of SER using the pre-trained models as upstream?
- [ ] Training models from higher agreement utterances
- [ ] Do the SOTA SER systems need calibrations on the predictions?
## Attributes (Arousal, Dominance, Valence)
- [ ] Can we reproduce the results of the paper ([DAWN OF THE TRANSFORMER ERA IN SPEECH EMOTION RECOGNITION: CLOSING THE VALENCE GAP](https://arxiv.org/pdf/2203.07378.pdf))?
- [ ] What are performances of using the other pre-trained models based on [the superbbenchmark leaderboard](https://superbbenchmark.org/leaderboard), such as WavLM, data2vec, or wav2vec?
- [ ] What are performances of cross-corpus without fine-tuning (e.g., train on MSP-PODCAST v.10, and test on USC-IEMOCAP)?
- [ ] How's robutness of the SOTA model under noise scenario (using MSP-PODCAST v1.8)?
- [ ] Is emotion attributes perception is universal (cross-language emotion recognition without fine-tuning)? (migth need to use multilingual pre-trained models, sucs [XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) or XLM)
## Categories (Anger, Sadness, Happiness, Neutral, to name a few)
- [ ] What are performances of the SOTA model using various pre-trained models on emotion classification tasks?
- [ ] Is wav2vec still the most effective SSL features on emotion classification tasks over multiple datasets (4-class emotion classification) based on the analyses [IS2021 paper](https://www.isca-speech.org/archive/pdfs/interspeech_2021/keesing21_interspeech.pdf)?
- [ ] What are performances of the SOTA models trained with different label learning methods?
- [ ] Hard-label learning
- [ ] Soft-label learning
- [ ] Multi-label learning
- [ ] Distribution learning
- [ ] What are performances of cross-corpus without fine-tuning (e.g., train on MSP-PODCAST v.10, and test on USC-IEMOCAP)?
- [ ] Is emotion classification perception is universal (cross-language emotion recognition without fine-tuning)? (migth need to use multilingual pre-trained models, sucs [XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) or XLM)
- [ ] How to deal with long-tail (very unbalanced) emotion recognition?
------------------------------------------
# Installation Environment
## (Recommend) [Github](https://github.com/sgleem/SS_for_SER/tree/development)
## For Python v3.7.13 & CUDA v11.0 & PyTorch v1.7.1
0. Download requirement file, [``huggin-face_env.txt''](https://utdallas.box.com/s/s36yt73h6mhyzoaa806lc0bi885spsqk)
1. (base)`conda create --name HuggingFace --file huggin-face_env.txt`
2. (base)`conda activate HuggingFace`
3. (base) `pip install transformers` **(install HuggingFace)**
4. (HuggingFace) `python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I love you'))"`
## For Python v3.9 & CUDA v11.2 & PyTorch v1.11.0
0. Download requirement file, [``hugging-face_python_3-9''](https://utdallas.box.com/s/5np8cy0gpusfu0zcg5o5kba8zb8o415a)
1. (base)`conda create --name HuggingFace --file hugging-face_python_3-9`
2. (base)`conda activate HuggingFace`
3. (base) `pip install transformers` **(install HuggingFace)**
4. (HuggingFace) `python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I love you'))"`
--PS: IF USING THIS VERSION DO THE FOLLOWING:
***Change the code in Line(34) in train.py if using code provided by Seong_Gyun***
* Replace `torch.set_deterministic(True)` with `torch.use_deterministic_algorithms(True)`
*
------------------------------------------
# Code
## Usage
### Model type (`--model_type `)
* wav2vec2-base, wav2vec2-large, **wav2vec2-large-robust**
* hubert-base, **hubert-large**
* wavlm-base, wavlm-base-plus, **wavlm-large**,
* data2vec-base, **data2vec-large**
### Default model type
* wav2vec2: **wav2vec2-large-robust**
* hubert: **hubert-large**
* wavlm: **wavlm-large**
* data2vec: **data2vec-large**
### Model GPU estimated memory requirements
| Model | Batchsize | GPMs | Estimated Time perEpoch (hr) |
| --------------------- | --------- | ---- | ---------------------------- |
| wav2vec2-base | 32 | 17GB | 1 |
| wav2vec2-large | 32 | 50GB | 1 |
| wav2vec2-large-robust | 32 | 40GB | .85 |
| hubert-base | 32 | 30GB | .50 |
| hubert-large | 32 | 50GB | 1 |
| wavlm-base | 32 | 30GB | .6 |
| wavlm-base-plus | 32 | 30GB | .6 |
| wavlm-large | 32 | 50GB | 1.35 |
| data2vec-base | 32 | 33GB | .5 |
| data2vec-large | 32 | 50GB | 1.35 |
### Computational Resources
* Cost: $1.5 USD per hour per GPU
* Each GPU has 32 GB memories
* The maxiums GPUs in one vitual environments: 8
* The maxium of GPU memories: 256 GB
### Corpus Estimated Cost
* MSP-PODCAST v1.10
| Model | Number of GPUs | Number of Epochs | Epoch | Estimated Time perEpoch | Cost |
| --------------------- | -------------- | ---------------- | ----- | ----------------------- | ---- |
| wav2vec2-base | 1 | 12GB | | | |
| wav2vec2-large | 2 | 50GB | | | |
| wav2vec2-large-robust | 2 | 40GB | | | |
| hubert-base | 1 | 30GB | | | |
| hubert-large | 2 | 50GB | | | |
| wavlm-base | 1 | 30GB | | | |
| wavlm-base-plus | 1 | 30GB | | | |
| wavlm-large | 2 | 50GB | | | |
| data2vec-base | 2 | 33GB | | | |
| data2vec-large | 2 | 50GB | | | |
## Definition
* categorical
1. Primary emotions: every rater **only can choose one emotion** from options pool
2. Secondary emotions: every rater is able to choose **more than one emotion** from options pool
3. Single-label task: every datasample has only one emotion as ground truth
4. Multi-label task: every datasample should be able to have co-occuring emotions (one or more) as ground truth
* label learning:
* Example: (anger, sadness, neutral, happiness); there are 2 anger and 3 neutral;
* Soft-label: **(0.4, 0.0, 0.6, 0.0)**
* Soft-label with alpha=0.05 label smoothing: **(0.39, 0.016667, 0.576667, 0.016667)**
* Single-label task
* Hard-label learning (cross-entropy)
* Hard-label of the exmple: **(0,0,1,0)**
* The label of the example with label smoothing will look like: **(0.016667, 0.016667, 0.95, 0.016667 )**
* Soft-label learning (cross-entropy)
* Soft-label: **(0.4, 0.0, 0.6, 0.0)**
* The label of the example with label smoothing will look like: **(0.39, 0.016667, 0.576667, 0.016667)**
* Multi-label task
* Multi-label learning (binary cross-entropy): no matter how many annotations are given; take the emotions into account
* The example with label : **(1,0,1,0)**
* Output activation layer: **Sigmoid**
* The proabilities can be binarized by thresold **0.5**
* Distribution-lable learning (binary cross-entropy): learn the distribution similarity between groundtruths and predictions
* The example will be the same as the soft-label **(0.39, 0.016667, 0.576667, 0.016667)**
* The distribution also can be binarized by threshold **1/k** (where k is the number of classes); for 4-class emotion classification, the threshold is $1/4=0.25$. Therefore, the multiple-hot vector output: **(1,0,1,0)**
## Arguments
0. Database
*
1. MSP-IMPROV
* Example
1. Partioned_data_secondary
1. labels_consensus_EmoS_4class_P
1. labels_consensus_1.csv
* D: davids rule
* P: Plurality
* M: Majority
* different datasets have different splits
* IMPROV has 6 splits
3. MSP-Podcast
* one partition
5. USC-IEMOCAP
* five partitions
7. CREMA-D
* five paritions
9. NNIME
* five partitions
2. Model (sorted by the results on [SUPERB leaderboard](https://superbbenchmark.org/leaderboard))
1. WavLM
2. data2vec
3. Hubert
4. wav2vec2
5. wav2vec **(Hugging Face doesn't support wav2vec, so make it lower priority)**
3. pre-processing
1. Normalization
1. Feature Normalization
1. utterrance norm (U-norm)
2. training norm (T-norm)
3. speaker norm (S-norm)
4. no norm (N-norm)
2. Label Normalization for Emotional Attribute
1. MSP-Podcast (ranged from 1 to 7)
- $(label - 1) / (7 - 1)$
2. MSP-IMPROV (ranged from 1 to 5)
- Flip arousal label (from "1 to 5" to "5 to 1") (**Fixed**)
- 6 - aro
- $(label - 1) / (5 - 1)$
3. USC-IEMOCAP (ranged from 1 to 5)
- $(label - 1) / (5 - 1)$
3. NTHU-NNIME (ranged from 1 to 5)
- $(label - 1) / (5 - 1)$
4. CREMA-D : (no emotional attribute)
2. Label selection
* categorical
1. majoriy vote
1. Hard label
2. Soft label
2. plurarity rule
1. Hard label
2. Soft label
3. multi-label
1. Multiple-hot label
5. Davids rule
1. Distribution label
4. multi-task vs single-task learning
* Single-task
* categorical
1. single-label task
* 4-class
2. multi-label task
* 4-class
* primary emotions
* secondary emotions
5. loss
* categorical
1. **cross-entropy** for hard/soft label
2. **binary cross-entropy** for multiple-hot label
3. **Kullback–Leibler divergence** for distribution label
* dimentional (VAD)
1. (1 - CCC) *(CCC,concordance coefficient correlation)*
6. classes
1. categorical
1. 4-class for **all datasets** *(Neutral, Anger, Sadness, Happiness)*
2. 8-calss **primary** emotions for MSP-PODCAST (v1.8/v1.9/v1.10)
3. 6-calss **primary** emotions for CREAM-D
4. 16-class secondary emotions for MSP-PODCAT (v1.8/v1.9/v.10)
5. 10-class secondary emotions for MSP-IMPROV
6. 9-class secondary emotions for USE-IEMOCAP
7. 11-class secondary emotions for NTHU-NNIME
2. dimentional (VAD)
7. Hyper-parameters
1. Epochs
2. Batch size
3. sequence length?
4. dropout?
5. depth?
8. Evaluation metrics
1. categorical
* Single-label task
1. UAR
2. UAP
3. ACC
4. Macro-F1
5. Micro-F1
6. Weighted-F1
* Multi-label task
1. Hamming loss
2. Ranking loss
3. ...
4. ...
5. ....
* Distribution-label
1. cosine simarlity
2. KLD
3. rmse
4.
3. dimentional (VAD)
* CCC
------------------------------------------
# Task
## Categories
* Majority Vote (more than 50% raters aggrement on labels)
* Plurality Rule (the class gets the most votes)
* David's Rule (set a threshold, such 1/k, where k is number of emotion class)
### Objective Functions
* Hard-label learning
* Objective function: cross-entropy
* Output layer activation function: softmax
* Soft-label learning
* Objective function: cross-entropy
* Output layer activation function: softmax
* Multi-label learning
* Objective function: binary cross-entropy
* Output layer activation function: **sigmoid**
* Distribution learning
* Objective function: Kullback–Leibler divergence (KLD)
* Output layer activation function: softmax
## Attributes
* Average all raters' answers
### Objective Functions
* 1- CCC(concordance coefficient correlation)
------------------------------------------
# Resources
## Emotion Databases Summary
* All datasets have 4 emotions ($angry, sad, neutral, happy$)
> Choice means how many emotions annotators can provide
| **Dataset** | **Choice** | **Class** | **Processed** | angry | sad | neutral | happy | other | frustrated | annoyed | disappointed | disgust | depressed | contempt | confused | concerned | fear | surprise | amused | excited | joy | relaxed | disappointed |
| ------------------------------- | ------------ | --------- | ----------------- | ----- | ----- | ------- | ----- | ----- | ---------- | ------- | ------------ | ------- | --------- | -------- | -------- | --------- | ----- | -------- | ------ | ------- | ----- | ------- | ------------ |
| **MSP-IMPROV** (**Primary**) | **Single** | 5 | **V** | **V** | **V** | **V** | **V** | **V** | | | | | | | | | | | | | | | |
| **MSP-PODCAST** (**Primary**) | **Single** | 9 | **V** | **V** | **V** | **V** | **V** | **V** | | | | **V** | **V** | | | | **V** | **V** | | | | | |
| **CREMA-D** | **Single** | 6 | **X (not split)** | **V** | **V** | **V** | **V** | | | | | **V** | | | | | **V** | | | | | | |
| **MSP-IMPROV** (**Secondary**) | **Multiple** | 11 | **V** | **V** | **V** | **V** | **V** | **V** | **V** | | | **V** | | | **V** | | **V** | **V** | | | | | |
| **MSP-PODCAST** (**Secondary**) | **Multiple** | 17 | **V** | **V** | **V** | **V** | **V** | **V** | **V** | **V** | **V** | **V** | **V** | **V** | **V** | **V** | **V** | **V** | **V** | **V** | | | |
| **USC-IEMOCAP** | **Multiple** | 10 | **V** | **V** | **V** | **V** | **V** | **V** | **V** | | | **V** | | | | | **V** | **V** | | **V** | | | |
| **NNIME** | **Multiple** | 12 | | **V** | **V** | **V** | **V** | **V** | **V** | | | | | | | | **V** | **V** | | **V** | **V** | **V** | **V** |
## MSP-PODCAST v1.10
### Categories (**Primary Emotion**)
<!---
| Session | Totall | angry | sad | disgust | contempt | fear | neutral | surprise | happy | other |
| ----------- | ------- | ------ | ------ | ------- | -------- | ------ | ------- | -------- | ------- | ------ |
| All | 660,351 | 50,887 | 41,928 | 39,120 | 56,962 | 29,367 | 209,670 | 58,745 | 145,294 | 28,378 |
| Train | 391222 | 29183 | 21554 | 19627 | 30148 | 15067 | 112682 | 29613 | 80105 | 12114 |
| Development | 72,581 | 9,917 | | | | | | | | |
| Test1 | | | | | | | | | | |
| Test2 | | | | | | | | | | |
--->
### Categories (**Secondary Emotion**)
<!---
| Session | Totall | angry | frustrated | annoyed | disappointed | sad | disgust | depressed | contempt | confused | concerned | fear | neutral | surprise | amused | excited | happy | other |
| -------- | -------- | ------ | ---------- | ------- | ------------ | ------ | ------- | --------- | -------- | -------- | --------- | ------ | ------- | -------- | ------ | ------- | ------- | ------ |
| All | 1486,718 | 72,538 | 68,795 | 79,413 | 53,078 | 74,556 | 65,657 | 27,310 | 90,405 | 33,531 | 115,687 | 38,357 | 251,291 | 97,548 | 85,347 | 98,064 | 194,695 | 40,446 |
| Session1 | | | | | | | | | | | | | | | | | | |
| Session2 | | | | | | | | | | | | | | | | | | |
| Session3 | | | | | | | | | | | | | | | | | | |
| Session4 | | | | | | | | | | | | | | | | | | |
| Session5 | | | | | | | | | | | | | | | | | | |
| Session6 | | | | | | | | | | | | | | | | | | |
--->
### 4class Categories (**Primary Emotion**)
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | ----------------- |
| Majority Vote | 4 | Single-label | 26.08% (27,190) | 73.92% (77,077) |
| Plurality Rule | 4 | Single-label | 19.44% (20,270) | 80.56% (883,997) |
| David's Rule | 4 | Multi-label | 2.48% (2,589) | 97.52% (101,678) |
* Majority Vote

* Plurality Rule

* David's Rule

### 4class Categories (**Secondary Emotion**)
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | ---------------- |
| Majority Vote | 4 | Single-label | 32.79%(34,186) | 67.21% (70,081) |
| Plurality Rule | 4 | Single-label | 17.80% (18,564) | 82.20% (85,703) |
| David's Rule | 4 | Multi-label | 0.40%(418) | 99.60% (103,849) |
* Majority Vote

* Plurality Rule

* David's Rule

### 8class Categories (**Primary Emotion**)
* 8class = $angry, sad, disgust, contempt, fear, neutral, surprise, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | ---------------- |
| Majority Vote | 8 | Single-label | 48.65% (50,721) | 51.35% (53,546) |
| Plurality Rule | 8 | Single-label | 18.91% (19,718) | 81.09% (84,549) |
| David's Rule | 8 | Multi-label | 0.00% (1) | 99.99% (104,266) |
### 16class Categories (**Secondary Emotion**)
* 8class = $angry, frustrated, annoyed, disappointed, sad, disgust, depressed, contempt, confused, concerned, fear, neutral, surprise, amused, excited, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | --------------- |
| Majority Vote | 8 | Single-label | 88.68% (92,467) | 11.32% (11,800) |
| Plurality Rule | 8 | Single-label | 29.31% (30,562) | 70.69% (73,705) |
| David's Rule | 8 | Multi-label | 0.00% (0) | 100% (104,267) |
## MSP-PODCAST v1.9
### Categories (**Primary Emotion**)
<!---
| Session | Totall | angry | sad | disgust | contempt | fear | neutral | surprise | happy | other |
| ----------- | ------- | ------ | ------ | ------- | -------- | ------ | ------- | -------- | ------- | ------ |
| All | 555,790 | 44,684 | 32,616 | 34,428 | 49,176 | 24,077 | 175,768 | 48,023 | 126,803 | 20,215 |
| Train | 348,769 | 27859 | 21554 | 19627 | 30148 | 15067 | 112682 | 29613 | 80105 | 12114 |
| Development | | | | | | | | | | |
| Test1 | | | | | | | | | | |
| Test2 | | | | | | | | | | |
--->
### Categories (**Secondary Emotion**)
<!---
| Session | Totall | angry | frustrated | annoyed | disappointed | sad | disgust | depressed | contempt | confused | concerned | fear | neutral | surprise | amused | excited | happy | other |
| -------- | -------- | ------ | ---------- | ------- | ------------ | ------ | ------- | --------- | -------- | -------- | --------- | ------ | ------- | -------- | ------ | ------- | ------- | ------ |
| All | 1238,328 | 64,461 | 59,771 | 70,330 | 39,736 | 61,254 | 58,449 | 21,551 | 76,445 | 25,758 | 86,716 | 31,097 | 213,566 | 78,295 | 70,968 | 82,029 | 169,445 | 28,457 |
| Session1 | | | | | | | | | | | | | | | | | | |
| Session2 | | | | | | | | | | | | | | | | | | |
| Session3 | | | | | | | | | | | | | | | | | | |
| Session4 | | | | | | | | | | | | | | | | | | |
| Session5 | | | | | | | | | | | | | | | | | | |
| Session6 | | | | | | | | | | | | | | | | | | |
--->
### 4class Categories (**Primary Emotion**)
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | --------------- |
| Majority Vote | 4 | Single-label | 25.23% (21,792) | 74.77% (64,597) |
| Plurality Rule | 4 | Single-label | 19.13% (16,525) | 80.87% (69,864) |
| David's Rule | 4 | Multi-label | 3.47% (2,997) | 96.53% (83,392) |
* Majority Vote

* Plurality Rule

* David's Rule

### 4class Catergories (**Secondary Emotion**)
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | -------------- |
| Majority Vote | 4 | Single-label | 31.38% (27,110) | 68.62 (59,279) |
| Plurality Rule | 4 | Single-label | 00000 | 000000 |
| David's Rule | 4 | Multi-label | 0.52% (451) | 99.48 (85,938) |
* Majority Vote

* Plurality Rule

* David's Rule

### 8class Catergories (**Primary Emotion**)
* 8class = $angry, sad, disgust, contempt, fear, neutral, surprise, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | --------------- |
| Majority Vote | 8 | Single-label | 45.90% (39,654) | 54.10% (46,735) |
| Plurality Rule | 8 | Single-label | 17.60% (15,206) | 82.40% (71,183) |
| David's Rule | 8 | Multi-label | 0.00% (1) | 99.99% (86,388) |
### 16class Catergories (**Secondary Emotion**)
* 16class = $angry, frustrated, annoyed, disappointed, sad, disgust, depressed, contempt, confused, concerned, fear, neutral, surprise, amused, excited, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | --------------- |
| Majority Vote | 8 | Single-label | 86.29% (74,542) | 13.71% (11,847) |
| Plurality Rule | 8 | Single-label | 27.54% (23,794) | 72.46% (62,595) |
| David's Rule | 8 | Multi-label | 0.00% (0) | 100% (86,389) |
## IEMOCAP
### Catergories

| Session | Totall | frustrated | angry | sad | disgust | excited | fear | neutral | surprise | happy | other |
| -------- | ------ | ---------- | ----- | ----- | ------- | ------- | ---- | ------- | -------- | ----- | ----- |
| All | 34,367 | 7,994 | 4,734 | 4,016 | 264 | 4,598 | 415 | 7,007 | 811 | 3,461 | 1,067 |
<!---

| Session | Totall | frustrated | angry | sad | disgust | excited | fear | neutral | surprise | happy | other |
| -------- | ------ | ---------- | ----- | ----- | ------- | ------- | ---- | ------- | -------- | ----- | ----- |
| All | 34,367 | 7,994 | 4,734 | 4,016 | 264 | 4,598 | 415 | 7,007 | 811 | 3,461 | 1,067 |
| Session1 | | | | | | | | | | | |
| Session2 | | | | | | | | | | | |
| Session3 | | | | | | | | | | | |
| Session4 | | | | | | | | | | | |
| Session5 | | | | | | | | | | | |
--->
### 4class Catergories
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| ------------------ | ----- | ------------ | ------------- | -------------- |
| Majority Vote (\*) | 4 | Single-label | 44.90%(4,508) | 55.10% (5,531) |
| Majority Vote | 4 | Single-label | 25.17%(2,527) | 74.82% (7,512) |
| Plurality Vote | 4 | Single-label | 24.94%(2,504) | 75.06% (7,535) |
| David's Rule | 4 | Multi-label | 10.41%(1,045) | 89.59% (8,994) |
> \*: Previous works only take one annotation from each annotator, but autually annotators might provide more than one emotion labels.
### 9class Catergories (All emotions)
* 9class = $frustrated, angry, sad, disgust, excited, fear, neutral, surprise, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | ------------- | -------------- |
| Majority Vote | 9 | Single-label | 31.37%(3,149) | 68.63% (6,890) |
| Plurality Vote | 9 | Single-label | 25.32%(2,542) | 74.68% (7,497) |
| David's Rule | 9 | Multi-label | 0%(0) | 100% (1,0039) |
## MSP-IMPROV
### Catergories (**Primary Emotion**)
<!---

--->
| Session | Totall | angry | sad | neutral | happy | other |
| -------- | ------ | ----- | ----- | ------- | ------ | ----- |
| All | 34,367 | 7,960 | 8,632 | 2,4393 | 17,558 | 3,159 |
<!---
| Session1 | | | | | | |
| Session2 | | | | | | |
| Session3 | | | | | | |
| Session4 | | | | | | |
| Session5 | | | | | | |
| Session6 | | | | | | |
--->
### Catergories (**Secondary Emotion**)
<!---

--->
| Session | Totall | depressed | frustrated | angry | sad | disgust | excited | fear | neutral | surprise | happy | other |
| -------- | ------- | --------- | ---------- | ----- | ------ | ------- | ------- | ----- | ------- | -------- | ------ | ----- |
| All | 105,587 | 4,174 | 10,552 | 8,955 | 10,017 | 4,214 | 9,418 | 1,775 | 26,758 | 6,199 | 19,057 | 4,468 |
<!---
| Session1 | | | | | | | | | | | | |
| Session2 | | | | | | | | | | | | |
| Session3 | | | | | | | | | | | | |
| Session4 | | | | | | | | | | | | |
| Session5 | | | | | | | | | | | | |
| Session6 | | | | | | | | | | | | |
--->
### 4class Catergories (**Primary Emotion**)
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | ----------- | -------------- |
| Majority Vote | 4 | Single-label | 9.20% (776) | 90.8% (7,662) |
| Plurality Rule | 4 | Single-label | 5.63% (475) | 94.37% (7,963) |
| David's Rule | 4 | Multi-label | 0.01% (1) | 99.99% (8,437) |
* Majority Vote

* Plurality Rule

* David's Rule

### 4class Catergories (**Secondary Emotion**)
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | -------------- | -------------- |
| Majority Vote | 4 | Single-label | 14.42% (1,048) | 87.58% (7,390) |
| Plurality Rule | 4 | Single-label | 6.28% (530) | 93.72% (7,908) |
| David's Rule | 4 | Multi-label | 0.01% (1) | 99.99% (8,437) |
* Majority Vote

* Plurality Rule

* David's Rule

### 10class Catergories (**Secondary Emotion**)
* 4class = $depressed, frustrated, angry, sad, disgust, excited, fear, neutral, surprise, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | -------------- | -------------- |
| Majority Vote | 4 | Single-label | 54.17% (4,571) | 45.82% (3,867) |
| Plurality Rule | 4 | Single-label | 12.34% (1,041) | 87.66% (7,397) |
| David's Rule | 4 | Multi-label | 0.01% (1) | 99.99% (8,437) |
### Attributes
| Method | Task | Discard | Used |
| ------- | ---- | --------------- | -------------------- |
| Average | Dom. | 0.62%(52/8,438) | 99.38% (8,386/8,438) |
## [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D)
### Focus on the audio alone

| Session | Totall | angry | sad | disgust | fear | neutral | happy |
| ------- | ------ | ------ | ----- | ------- | ----- | ------- | ----- |
| All | 73,254 | 10,376 | 7,404 | 9,588 | 8,831 | 32,145 | 4,910 |
### 6 class Catergories
* 6class = $angry, sad, disgust, fear, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | -------------- | --------------- |
| Majority Vote | 4 | Single-label | 35.80% (2,664) | 64.20% (4,778) |
| Plurality Rule | 4 | Single-label | 8.55% (636) | 91.45% (6,806) |
| David's Rule | 4 | Multi-label | 0.00% (0) | 100.00% (7,442) |
### 4class Catergories
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | -------------- | -------------- |
| Majority Vote | 4 | Single-label | 13.76% (1,024) | 86.24% (6,418) |
| Plurality Rule | 4 | Single-label | 7.65% (569) | 93.72% (6,873) |
| David's Rule | 4 | Multi-label | 5.03% (374) | 94.97% (7,068) |
* Majority Vote

* Plurality Rule

* David's Rule

## NTHU-NNIME

| Session | Totall | angry | frustrated | disappointed | sad | fear | neutral | surprise | excited | happy | relax | joy | otehr |
| ------- | ------ | ----- | ---------- | ------------ | ----- | ---- | ------- | -------- | ------- | ----- | ----- | --- | ----- |
| All | 18,631 | 2,161 | 797 | 415 | 1,041 | 580 | 8,776 | 1,294 | 1,519 | 304 | 674 | 767 | 303 |
### 4class Catergories
* 4class = $angry, sad, neutral, happy$
* **happy includes joy and happy**
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | -------------- | -------------- |
| Majority Vote | 4 | Single-label | 27.61% (1,545) | 72.39% (4,051) |
| Plurality Rule | 4 | Single-label | 27.50% (1,539) | 72.50% (4,057) |
| David's Rule | 4 | Multi-label | 21.46% (1,201) | 78.54% (4,395) |
* Majority Vote

* Plurality Rule

* David's Rule

### 11class Catergories
* 11class = $angry, frustrated, disappointed, sad, fear, neutral, surprise, excited, happy, relax, joy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | -------------- | -------------- |
| Majority Vote | 4 | Single-label | 32.34% (1,810) | 67.66% (3,786) |
| Plurality Rule | 4 | Single-label | 25.04% (1,401) | 74.96% (4,195) |
| David's Rule | 4 | Multi-label | 9.29% (520) | 90.71% (5,076) |
<!---
## MSP-PODCAST v1.8
### Catergories (**Primary Emotion**)
### Catergories (**Secondary Emotion**)
### 4class Catergories (**Primary Emotion**)
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | --------------- |
| Majority Vote | 4 | Single-label | 25.23% (21,792) | 74.77% (64,597) |
| Plurality Rule | 4 | Single-label | 19.13% (16,525) | 80.87% (69,864) |
| David's Rule | 4 | Multi-label | 3.47% (2,997) | 96.53% (83,392) |
* Majority Vote
* Plurality Rule
* David's Rule
### 4class Catergories (**Secondary Emotion**)
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | -------------- |
| Majority Vote | 4 | Single-label | 31.38% (27,110) | 68.62 (59,279) |
| Plurality Rule | 4 | Single-label | 00000 | 000000 |
| David's Rule | 4 | Multi-label | 0.52% (451) | 99.48 (85,938) |
* Majority Vote
* Plurality Rule
* David's Rule
## MSP-PODCAST v1.7
### Catergories (**Primary Emotion**)
### Catergories (**Secondary Emotion**)
### 4class Catergories (**Primary Emotion**)
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | --------------- |
| Majority Vote | 4 | Single-label | 25.23% (21,792) | 74.77% (64,597) |
| Plurality Rule | 4 | Single-label | 19.13% (16,525) | 80.87% (69,864) |
| David's Rule | 4 | Multi-label | 3.47% (2,997) | 96.53% (83,392) |
* Majority Vote
* Plurality Rule
* David's Rule
### 4class Catergories (**Secondary Emotion**)
* 4class = $angry, sad, neutral, happy$
| Method | Class | Task | Discard | Used |
| -------------- | ----- | ------------ | --------------- | -------------- |
| Majority Vote | 4 | Single-label | 31.38% (27,110) | 68.62 (59,279) |
| Plurality Rule | 4 | Single-label | 00000 | 000000 |
| David's Rule | 4 | Multi-label | 0.52% (451) | 99.48 (85,938) |
* Majority Vote
* Plurality Rule
* David's Rule
--->
------------------------------------------
# Experimental Setup
## Features: SOTA SSL Speech SSL Representation ([SUPERB Benchmark Leaderboard](https://superbbenchmark.org/leaderboard))
* [WavLM Large](https://arxiv.org/pdf/2110.13900)
* [Github](https://github.com/microsoft/unilm/tree/master/wavlm)
* [HuggingFace](https://huggingface.co/models?other=wavlm)
* [Speaker Verification Demo on HuggingFace](https://huggingface.co/spaces/microsoft/wavlm-speaker-verification)
* [HuBERT Large](https://arxiv.org/abs/2106.07447)
* [Github based on fairseq (Meta platform)](https://github.com/facebookresearch/fairseq/blob/main/examples/hubert/README.md)
* [HuggingFace](https://huggingface.co/docs/transformers/model_doc/hubert)
* [data2vec Large](https://arxiv.org/pdf/2202.03555.pdf)
* [Github](https://github.com/facebookresearch/fairseq/tree/main/examples/data2vec)
* [HuggingFace](https://huggingface.co/docs/transformers/model_doc/data2vec)
* [wav2vec 2.0 Large](https://arxiv.org/abs/2006.11477)
* [Github](https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md)
* [HuggingFace](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self)
## Hyperparameter
| Hyperparameter | Number |
| -------------- | ------ |
| Batchsize | 16 |
| Epoch | 10 |
## Validation
## Scripts
### Here is the description of the argument that you can change for train.py:
* *model_type*: wav2vec2, hubert, data2vec, or wavlm <= to change the model type that you want to run
* *seed*: just to change the initial model weight and minibatch order during training
* *batch_size, epochs, lr*: batch size, total number, and learning rate of epochs for training
> It will save the model that shows the best performances in the development set within the maximum epochs
* *hidden_dim, num_layers*: # of nodes and hidden layer for the classification head
* *model_path*: the directory that you want to save the model
------------------------------------------
# Results
## Attributes
### MSP-PODCAST v1.7
| Upstream | Normalization | Arousal | Dominance | Valence |
| -------- | ------------- | ------- | --------- | ------- |
| wav2vec2 | | 0.7051 | 0.6345 | 0.574 |
| WavLM | | 0.7108 | 0.6160 | 0.455 |
| HuBERT | | 0.7140 | 0.6550 | 0.452 |
| Upstream | Chunking | Normalization | Arousal | Dominance | Valence |
| -------- |----------- | ------------- | ------- | --------- | ------- |
| Wav2vec2-large-robust |Average Pooling | T-norm | 0.7051 | 0.6345 | 0.574 |
| Wav2vec2-large-robust | LSTM-RNNAttenVec | T-norm | 0.6288 | 0.5384 | 0.029 |
### MSP-PODCAST v1.8
* **Clean**
| Upstream | Arousal | Dominance | Valence |
| -------- | ------- | --------- | ------- |
| wav2vec2 | | | |
| HuBERT | | | |
| Text | Text | Text | |
* **Noise (SNR: xxx)**
| Upstream | Arousal | Dominance | Valence |
| -------- | ------- | --------- | ------- |
| wav2vec2 | | | |
| HuBERT | | | |
| Text | Text | Text | |
* **Noise (SNR: xxx)**
| Upstream | Arousal | Dominance | Valence |
| -------- | ------- | --------- | ------- |
| wav2vec2 | | | |
| HuBERT | | | |
| Text | Text | Text | |
* **Noise (SNR: xxx)**
| Upstream | Arousal | Dominance | Valence |
| -------- | ------- | --------- | ------- |
| wav2vec2 | | | |
| HuBERT | | | |
| Text | Text | Text | |
### MSP-PODCAST v1.10
| Upstream | Normalization | Arousal | Dominance | Valence |
| -------- | ------------- | ------- | --------- | ------- |
| wav2vec2 | | 0.5846 | 0.4548 | 0.4586 |
| HuBERT | | 0.5448 | 0.4179 | 0.3306 |
| Text | | Text | Text | |
### USC-IEMOCAP
* Used pretrained model:**wav2vec**
* Normalization: **T-norm**
| Partition | Train | Development | Test | Arousal | Dominance | Valence |
| --------- | ----------------- | ----------- | ----- | ------- | --------- | ------- |
| 01 | Ses01,Ses02,Ses03 | Ses04 | Ses05 | 0.7164 | 0.5435 | 0.5671 |
| 02 | Ses02,Ses03,Ses04 | Ses05 | Ses01 | 0.6829 | 0.5839 | 0.5755 |
| 03 | Ses03,Ses04,Ses05 | Ses01 | Ses02 | 0.6706 | 0.4357 | 0.6484 |
| 04 | Ses04,Ses05,Ses01 | Ses02 | Ses03 | 0.7130 | 0.4270 | 0.5726 |
| 05 | Ses05,Ses01,Ses02 | Ses04 | Ses04 | 0.7232 | 0.4396 | 0.6067 |
| Average | - | - | - | 0.70122 | 0.48594 | 0.59406 |
### MSP-IMPROV
* Used pretrained model:**wav2vec**
* Normalization: T-norm
| Partition | Train | Development | Test | Arousal | Dominance | Valence |
| --------- | ----------------------- | ----------- | ----- | ------- | --------- | ------- |
| 01 | Ses01,Ses02,Ses03,Ses04 | Ses05 | Ses06 | 0.6386 | 0.4837 | 0.6414 |
| 02 | Ses06,Ses01,Ses02,Ses03 | Ses04 | Ses05 | 0.6642 | 0.4577 | 0.6571 |
| 03 | Ses05,Ses06,Ses01,Ses02 | Ses03 | Ses04 | 0.5734 | 0.3834 | 0.5632 |
| 04 | Ses04,Ses05,Ses06,Ses01 | Ses02 | Ses03 | 0.7122 | 0.5499 | 0.4837 |
| 05 | Ses03,Ses04,Ses05,Ses06 | Ses01 | Ses02 | 0.6594 | 0.4404 | 0.4640 |
| 06 | Ses02,Ses03,Ses04,Ses05 | Ses06 | Ses01 | 0.5448 | 0.4179 | 0.3306 |
| Average | - | - | - | 0.6321 | 0.4555 | 0.5233 |
### NTHU-NNIME
## Categories
### MSP-PODCAST v1.7
### MSP-PODCAST v1.8
### MSP-PODCAST v1.9
### MSP-PODCAST v1.10
### USC-IEMOCAP
* **Majority Vote (Hard label; Single-label task)**
| Partition | Train | Development | Test | macroF1 | microF1 | weightedF1 |
| --------- | ----------------- | ----------- | ----- | ------- | ------- | ---------- |
| 01 | Ses01,Ses02,Ses03 | Ses04 | Ses05 | | | |
| 02 | Ses02,Ses03,Ses04 | Ses05 | Ses01 | | | |
| 03 | Ses03,Ses04,Ses05 | Ses01 | Ses02 | | | |
| 04 | Ses04,Ses05,Ses01 | Ses02 | Ses03 | | | |
| 05 | Ses05,Ses01,Ses02 | Ses04 | Ses04 | | | |
| Average | - | - | - | | | |
* **Majority Vote (Soft label; Single-label task)**
| Partition | Train | Development | Test | macroF1 | microF1 | weightedF1 |
| --------- | ----------------- | ----------- | ----- | ------- | ------- | ---------- |
| 01 | Ses01,Ses02,Ses03 | Ses04 | Ses05 | | | |
| 02 | Ses02,Ses03,Ses04 | Ses05 | Ses01 | | | |
| 03 | Ses03,Ses04,Ses05 | Ses01 | Ses02 | | | |
| 04 | Ses04,Ses05,Ses01 | Ses02 | Ses03 | | | |
| 05 | Ses05,Ses01,Ses02 | Ses04 | Ses04 | | | |
| Average | - | - | - | | | |
* **David's Vote (Soft label; Multi-label task)**
| Partition | Train | Development | Test | macroF1 | microF1 | weightedF1 |
| --------- | ----------------- | ----------- | ----- | ------- | ------- | ---------- |
| 01 | Ses01,Ses02,Ses03 | Ses04 | Ses05 | | | |
| 02 | Ses02,Ses03,Ses04 | Ses05 | Ses01 | | | |
| 03 | Ses03,Ses04,Ses05 | Ses01 | Ses02 | | | |
| 04 | Ses04,Ses05,Ses01 | Ses02 | Ses03 | | | |
| 05 | Ses05,Ses01,Ses02 | Ses04 | Ses04 | | | |
| Average | - | - | - | | | |
### MSP-IMPROV
* **Majority Vote (Hard label; Single-label task)**
| Partition | Train | Development | Test | macroF1 | microF1 | weightedF1 |
| --------- | ----------------------- | ----------- | ----- | ------- | ------- | ---------- |
| 01 | Ses01,Ses02,Ses03,Ses04 | Ses05 | Ses06 | | | |
| 02 | Ses06,Ses01,Ses02,Ses03 | Ses04 | Ses05 | | | |
| 03 | Ses05,Ses06,Ses01,Ses02 | Ses03 | Ses04 | | | |
| 04 | Ses04,Ses05,Ses06,Ses01 | Ses02 | Ses03 | | | |
| 05 | Ses03,Ses04,Ses05,Ses06 | Ses01 | Ses02 | | | |
| 06 | Ses02,Ses03,Ses04,Ses05 | Ses06 | Ses01 | | | |
| Average | - | - | - | | | |
* **Majority Vote (Soft label; Single-label task)**
| Partition | Train | Development | Test | macroF1 | microF1 | weightedF1 |
| --------- | ----------------------- | ----------- | ----- | ------- | ------- | ---------- |
| 01 | Ses01,Ses02,Ses03,Ses04 | Ses05 | Ses06 | | | |
| 02 | Ses06,Ses01,Ses02,Ses03 | Ses04 | Ses05 | | | |
| 03 | Ses05,Ses06,Ses01,Ses02 | Ses03 | Ses04 | | | |
| 04 | Ses04,Ses05,Ses06,Ses01 | Ses02 | Ses03 | | | |
| 05 | Ses03,Ses04,Ses05,Ses06 | Ses01 | Ses02 | | | |
| 06 | Ses02,Ses03,Ses04,Ses05 | Ses06 | Ses01 | | | |
| Average | - | - | - | | | |
* **Plurality Rule (Hard label; Single-label task)**
| Partition | Train | Development | Test | macroF1 | microF1 | weightedF1 |
| --------- | ----------------------- | ----------- | ----- | ------- | ------- | ---------- |
| 01 | Ses01,Ses02,Ses03,Ses04 | Ses05 | Ses06 | | | |
| 02 | Ses06,Ses01,Ses02,Ses03 | Ses04 | Ses05 | | | |
| 03 | Ses05,Ses06,Ses01,Ses02 | Ses03 | Ses04 | | | |
| 04 | Ses04,Ses05,Ses06,Ses01 | Ses02 | Ses03 | | | |
| 05 | Ses03,Ses04,Ses05,Ses06 | Ses01 | Ses02 | | | |
| 06 | Ses02,Ses03,Ses04,Ses05 | Ses06 | Ses01 | | | |
| Average | - | - | - | | | |
* **Plurality Rule (Soft label; Single-label task)**
| Partition | Train | Development | Test | macroF1 | microF1 | weightedF1 |
| --------- | ----------------------- | ----------- | ----- | ------- | ------- | ---------- |
| 01 | Ses01,Ses02,Ses03,Ses04 | Ses05 | Ses06 | | | |
| 02 | Ses06,Ses01,Ses02,Ses03 | Ses04 | Ses05 | | | |
| 03 | Ses05,Ses06,Ses01,Ses02 | Ses03 | Ses04 | | | |
| 04 | Ses04,Ses05,Ses06,Ses01 | Ses02 | Ses03 | | | |
| 05 | Ses03,Ses04,Ses05,Ses06 | Ses01 | Ses02 | | | |
| 06 | Ses02,Ses03,Ses04,Ses05 | Ses06 | Ses01 | | | |
| Average | - | - | - | | | |
* **David's Vote (Soft label; Multi-label task)**
| Partition | Train | Development | Test | macroF1 | microF1 | weightedF1 |
| --------- | ----------------------- | ----------- | ----- | ------- | ------- | ---------- |
| 01 | Ses01,Ses02,Ses03,Ses04 | Ses05 | Ses06 | | | |
| 02 | Ses06,Ses01,Ses02,Ses03 | Ses04 | Ses05 | | | |
| 03 | Ses05,Ses06,Ses01,Ses02 | Ses03 | Ses04 | | | |
| 04 | Ses04,Ses05,Ses06,Ses01 | Ses02 | Ses03 | | | |
| 05 | Ses03,Ses04,Ses05,Ses06 | Ses01 | Ses02 | | | |
| 06 | Ses02,Ses03,Ses04,Ses05 | Ses06 | Ses01 | | | |
| Average | - | - | - | | | |
### [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D)
### NTHU-NNIME
# Reference
## Pytorch Extention Tools
* [pytorchlightning (Trainers)](https://www.pytorchlightning.ai/)
* [NeptuneLogger in pytorchlightning](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.loggers.neptune.html)
* [HuggingFace Installation Doc.](https://huggingface.co/docs/transformers/installation)
* [Mertics](https://github.com/Lightning-AI/metrics)
* [Macro-F1](https://torchmetrics.readthedocs.io/en/stable/classification/f1_score.html)
## Develop Envirment
* [Docker Desktop for Linux user manual](https://docs.docker.com/desktop/linux/)
## Other Papers Worth Reading
* [M-SENA: An Integrated Platform for Multimodal Sentiment Analysis (ACL2020 Demo Track)](https://aclanthology.org/2022.acl-demo.20.pdf )
## Paper
### Emotion Recognition
* [Multi-modal Emotion Estimation for in-the-wild Videos](https://arxiv.org/pdf/2203.13032)
* Findings
* **Combination of wav2vec and ComParE got the best!**
* [Dawn of the transformer era in speech emotion recognition: closing the valence gap](https://arxiv.org/pdf/2203.07378.pdf)
* Same authers' paper
* [Probing Speech Emotion Recognition Transformers for Linguistic Knowledge (INTERSPEECH 2022)](https://arxiv.org/pdf/2204.00400.pdf)
* Findings:
* **Fine-tuning the transformer layers is necessary.**
* **Starting from a pre-trained model reduces the number of epochs needed to converge and improves performance stability across training runs with different seeds.**
* **A reduction of training samples without loss in performance is only possible for arousal and dominance. With respect to valence, there is no sweet point in our data.**
* Code:
* [HuggingFace](https://github.com/audeering/w2v2-how-to)
* [Seong-Gyun Code](https://cometmail-my.sharepoint.com/:u:/g/personal/hdc210001_utdallas_edu/EdwIMZKJ3dpDgssPm24EHnYB_fhG1ymKG5S5XOJV44VqXA)
* Results:
* Within-corpus result (Fine-tuned and tested with same corpus)
U-Norm
| | Arousal |Dominance| Valence |
|-----------------|---------|---------|---------|
|Original model |**0.744**|**0.655**|**0.638**|
|MSP-Podcast v1.7 | 0.671 | 0.502 | 0.587 |
|MSP-Podcast v1.10| 0.577 | 0.432 | 0.445 |
|MSP-IMPROV (pt-1)| 0.655 | 0.467 | 0.613 |
|MSP-IMPROV (pt-2)| 0.670 | 0.508 | 0.634 |
* Cross-corpus result (finetuned with MSP-Podcast v1.10, tested with different corpus)
| | Arousal |Dominance| Valence |
|-----------------|----------|---------|---------|
|MSP-Podcast v1.10| 0.577 | 0.432 | 0.445 |
|MSP-IMPROV | 0.536 | 0.423 | 0.339 |