--- title: 'Emotion Classification 2022' disqus: hackmd --- ###### tags:`TEEP 2022` ## Emotion Classification [TOC] ## Need for Study ***iTunes size argument** https://en.wikipedia.org/wiki/ITunes_Store Apple iTunes is a prominent digital music market that opened in 2003 and hosted approx. 20 million songs as of 2020. **Lee 2010** Challenges in MIR include lack of formal music education among users, seeking music from different cultures, instrumental music lacks searchable lyrics, and even songs with lyrics are rarely comprehended perfectly. ## Literature Review --- The sources summarized below are organized in chronological order and pertain to the task of music mood classification. The studies tend to focus on one of two broad topics: **1)** How to categorize mood as experienced by humans listening to music (henceforth *mood categorization*), and **2)** How to extract and model musical features in a way that agrees with a given categorization (henceforth *mood classification*). ### Mood Categorization: **Hevner 1936** ![hevner1936](https://i.imgur.com/Z0vExIO.png =400x400) **General Inquirer** Early attempt to objectively generate psychological state tags from a large natural text corpus. **Russell 1980** ![Russell1980](https://i.imgur.com/qcWJYd1.png =400x400) **Affective Norms for English Words (ANEW)** 1,000+ English words with scores obtained from human questionnaires. 3 dimensions: valence, arousal, and dominance ![bradley1999](https://i.imgur.com/vzC1H4g.png =350x200) **WordNet-Affect** A semantically grouped list of 4787 affect words grouped in 2,874 synsets. Made by using human-written list of 1903 affect words as cores in WordNet to get synsets. ![strapparava2004](https://i.imgur.com/Cp5xZA0.png =600x200) **Hu 2010 - "Lyrics":** 18 categories based on Russell and last.fm ![hu2010](https://i.imgur.com/JOGB1AU.png =600x400) ### Classification Approach: **Laurier 2008:** bimodal approach (audio+lyrics) audio: SVM, Random Forest, Logistic Regression lyrics: k-nearest neighbor on a Bag Of Words, Latent Semantic Analysis, Language Model Difference Bimodal combination methods: voting system or mixed feature space (audio and lyric data in same vector) **Hu 2010 - "Improving":** SVM ### Musical Features: **Hevner 1936:** ![hevner1936](https://i.imgur.com/wI8FF4a.png =400x400) **Laurier 2008:** timbral (MFCC, spectral centroid); rhythmic (tempo, onset); tonal (Harmonic Pitch); 'temporal' **Hu 2010 - "Improving"** MARSYAS (toolbox like Librosa) features - 63 spectral features: means and variances of Spectral Centroid, Rolloff, Flux, Mel-Frequency Cepstral Coefficients (MFCC), etc. ### Datasets: **Laurier 2008:** Last.fm (mood tags) **CAL500** Human-labeled random set of 500 Western modern songs labeled for 18 emotions on 1-3 scale. **Million Song Dataset (MSD)** Songs from 1922 to present, no music included, signal data, metadata, no lyrics uses Echo Nest features **musiXmatch** The musiXmatch dataset 10 includes 237, 662 tracks of the MSD plus lyrics http://labrosa.ee.columbia.edu/millionsong/musixmatch ### Chronological BibTex references with annotations: ```gherkin= @article{hevner1936experimental, title={Experimental studies of the elements of expression in music}, author={Hevner, Kate}, journal={The American Journal of Psychology}, volume={48}, number={2}, pages={246--268}, year={1936}, publisher={JSTOR} } # American study from the 1930s from an old experimental psychology approach. # "There are many ways to enjoy music, and many types of music to enjoy." # Musicl training is not needed to distinguish mood. # 8 mood categories: happy, graceful, serene, dreamy, sad, dignified, vigorous, exciting # 4 music featuers: # melodic direction (ascending vs. descending) # harmony complexity (simple vs. complex) # Mode (major vs. minor) # Rhythm (firm vs. flowing) # Study only considered classical music. @article{stone1966general, title={The general inquirer: A computer approach to content analysis.}, author={Stone, Philip J and Dunphy, Dexter C and Smith, Marshall S}, year={1966}, publisher={MIT press} } # Old paper is hard to understand # It's a mapping tool? # Modern explanation: (https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjkwNPNqJD4AhXBed4KHTieDlMQFnoECAUQAQ&url=https%3A%2F%2Fera.library.ualberta.ca%2Fitems%2F0e790a8e-a263-4a99-9a77-418d91b700c0%2Fview%2F1822ecf2-7f28-47d0-a08f-95ceaf8f92af%2FBeginnings_of_Content_Analysis.Revised.pdf&usg=AOvVaw2iPRWRh1_eKdAAcqpxVjtk) # Spreadsheet example: http://www.wjh.harvard.edu/~inquirer/spreadsheet_guide.htm @article{russell1980circumplex, title={A circumplex model of affect.}, author={Russell, James A}, journal={Journal of personality and social psychology}, volume={39}, number={6}, pages={1161}, year={1980}, publisher={American Psychological Association} } # Psychology study from the 1980s # Proposes a mood categorization scheme using the term "affect." # 360 degree Affect wheel with sleep-arousal on the Y axis and (dis)pleasure on the X axis. # study aims to simplify 6-axis and 12-axis affect models # Participants given a list of 28 words and asked to classify them on the circle. # Used principle component analysis to validate his scheme. @techreport{bradley1999affective, title={Affective norms for English words (ANEW): Instruction manual and affective ratings}, author={Bradley, Margaret M and Lang, Peter J}, year={1999}, institution={Technical report C-1, the center for research in psychophysiology~…} } # three dimensions of pleasure, arousal, and dominance @article{peretz2004singing, title={Singing in the brain: Insights from cognitive neuropsychology}, author={Peretz, Isabelle and Gagnon, Lise and H{\'E}{\'E}BERT, SYLVIE and MACOIR, JO{\"E}{\"E}L}, journal={Music Perception}, volume={21}, number={3}, pages={373--390}, year={2004}, publisher={University of California Press . USA} } # Foundational study in arguing for bimodal approach # Case study of "aphasia without amusia" patient who cannot speak but can produce music. # Music and speech are processed in different regions of the brain. # Humans understand song mood from both lyrics and audio # Relevance: justifies use bimodal approach @inproceedings{strapparava2004wordnet, title={Wordnet affect: an affective extension of wordnet.}, author={Strapparava, Carlo and Valitutti, Alessandro and others}, booktitle={Lrec}, volume={4}, number={1083-1086}, pages={40}, year={2004}, organization={Lisbon, Portugal} } # created subgroup of WordNet that deals with affect words # Name: WordNet-Affect, part of larger WordNet-DOMAIN project # synset = synonym set # manually compiled list of 1903 affect words # affect words used as synset cores @inproceedings{turnbull2007towards, title={Towards musical query-by-semantic-description using the cal500 data set}, author={Turnbull, Douglas and Barrington, Luke and Torres, David and Lanckriet, Gert}, booktitle={Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval}, pages={439--446}, year={2007} } # Download dataset from [cosmal.ucsd.edu/cal](http://calab1.ucsd.edu/~datasets/cal500/) # human-labeled random set of 500 Western modern songs # Labeled for 18 emotions on 1-3 scale @inproceedings{laurier2008multimodal, title={Multimodal music mood classification using audio and lyrics}, author={Laurier, Cyril and Grivolla, Jens and Herrera, Perfecto}, booktitle={2008 seventh international conference on machine learning and applications}, pages={688--693}, year={2008}, organization={IEEE} } # Bimodal approach: lyrics + music features. # Adapts Russell's binary categories: # low-high arousal (Russell's sleep-arousal) # positive-negative valence (Russell's pleasure-displeasure) # Used last.fm tags to set up mood ground truths (plus 17 human validators) # Features: # timbral (MFCC, spectral centroid) # rhythmic (tempo, onset rate) # tonal (like Harmonic Pitch Class Profiles) # temporal (authors not specific about features in this paper) # Bimodal approach gave best mood classification accuracy. @inproceedings{laurier2009music, title={Music Mood Representations from Social Tags.}, author={Laurier, Cyril and Sordo, Mohamed and Serra, Joan and Herrera, Perfecto}, booktitle={ISMIR}, pages={381--386}, year={2009} } # More work validating the arousal-valence model. # Identified mood tags in last.fm>LSA to cluster tags>compare with Hevner, MIREX, and Russell # Conclusion: experts and last.fm users agree on the arousal-valence model. @inproceedings{hu2010improving, title={Improving mood classification in music digital libraries by combining lyrics and audio}, author={Hu, Xiao and Downie, J Stephen}, booktitle={Proceedings of the 10th annual joint conference on Digital libraries}, pages={159--168}, year={2010} } # Bimodal mood classification # Russell scheme # SVM model @inproceedings{hu2010lyrics, title={When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis.}, author={Hu, Xiao and Downie, J Stephen}, booktitle={ISMIR}, pages={619--624}, year={2010}, organization={Citeseer} } # ground truths from last.fm tags # Lyrics outperform audio features # Quote: one single song could # belong to multiple mood # categories. This is in # fact more realistic # than a single-label # setting since a music # piece may carry multiple # moods such as “happy and # calm” or “aggressive # and depressed”. @article{lee2010analysis, title={Analysis of user needs and information features in natural language queries seeking music information}, author={Lee, Jin Ha}, journal={Journal of the American Society for Information Science and Technology}, volume={61}, number={5}, pages={1025--1045}, year={2010}, publisher={Wiley Online Library} } # General field of information seeking and retrieval: # Bates’ (1989) Berrypicking model # Kuhlthau’s (1991) Information Search Process model # Dervin’s (1992) sense-making model # Savolainen’s (1995) Every- day Life Information Seeking (ELIS) model # Ingwersen’s (1992) cognitive mode # Wilson’s (1999) model of information seeking # Challenges in MIR: # people who had no formal music education # people who seek music from different cultures # not all musical works are accom- panied by lyrics # often difficult to comprehend the lyrics @inproceedings{tsunoo2010music, title={Music mood classification by rhythm and bass-line unit pattern analysis}, author={Tsunoo, Emiru and Akase, Taichi and Ono, Nobutaka and Sagayama, Shigeki}, booktitle={2010 IEEE International Conference on Acoustics, Speech and Signal Processing}, pages={265--268}, year={2010}, organization={IEEE} } # Japanese study, terrible English. No proofreader? # Along with Ono, they push their rhythm and base-line approach # harmonic/percussive sound separation (HPSS) technique pro- posed by Ono # Used Weka suite # SVM? unclear @article{bertin2011million, title={The million song dataset}, author={Bertin-Mahieux, Thierry and Ellis, Daniel PW and Whitman, Brian and Lamere, Paul}, year={2011} } #Details on the MSD @inproceedings{schuller2011multi, title={Multi-modal non-prototypical music mood analysis in continuous space: realiability and performances}, author={Schuller, Bj{\"o}rn and Weninger, Felix and Dorfner, Johannes}, booktitle={Proc. 12th Intern. Society for Music Information Retrieval Conference (ISMIR) 2011, ISMIR, Miami, FL, USA}, year={2011} } # @article{brinker2012expressed, title={Expressed music mood classification compared with valence and arousal ratings}, author={Brinker, Bert den and Dinther, Ralph van and Skowronek, Janto}, journal={EURASIP Journal on Audio, Speech, and Music Processing}, volume={2012}, number={1}, pages={1--14}, year={2012}, publisher={SpringerOpen} } # @inproceedings{weninger2014line, title={On-line continuous-time music mood regression with deep recurrent neural networks}, author={Weninger, Felix and Eyben, Florian and Schuller, Bj{\"o}rn}, booktitle={2014 IEEE international conference on acoustics, speech and signal processing (ICASSP)}, pages={5412--5416}, year={2014}, organization={IEEE} } # @article{saari2015genre, title={Genre-adaptive semantic computing and audio-based modelling for music mood annotation}, author={Saari, Pasi and Fazekas, Gy{\"o}rgy and Eerola, Tuomas and Barthet, Mathieu and Lartillot, Olivier and Sandler, Mark}, journal={IEEE Transactions on Affective Computing}, volume={7}, number={2}, pages={122--135}, year={2015}, publisher={IEEE} } # @article{moon2015mood, title={Mood lighting system reflecting music mood}, author={Moon, Chang Bae and Kim, HyunSoo and Lee, Dong Won and Kim, Byeong Man}, journal={Color Research \& Application}, volume={40}, number={2}, pages={201--212}, year={2015}, publisher={Wiley Online Library} } # @article{plewa2015music, title={Music mood visualization using self-organizing maps}, author={Plewa, Magdalena and Kostek, Bozena}, journal={Archives of Acoustics}, volume={40}, number={4}, pages={513--525}, year={2015} } # @article{mo2017novel, title={A novel method based on OMPGW method for feature extraction in automatic music mood classification}, author={Mo, Shasha and Niu, Jianwei}, journal={IEEE Transactions on Affective Computing}, volume={10}, number={3}, pages={313--324}, year={2017}, publisher={IEEE} } # @inproceedings{ccano2017music, title={Music mood dataset creation based on last. fm tags}, author={{\c{C}}ano, Erion and Morisio, Maurizio and others}, booktitle={2017 International Conference on Artificial Intelligence and Applications, Vienna, Austria}, pages={15--26}, year={2017} } # @article{delbouys2018music, title={Music mood detection based on audio and lyrics with deep neural net}, author={Delbouys, R{\'e}mi and Hennequin, Romain and Piccoli, Francesco and Royo-Letelier, Jimena and Moussallam, Manuel}, journal={arXiv preprint arXiv:1809.07276}, year={2018} } # Audio-only CNN # Lyrics: every model type except transformers # MSD dataset # 60:20:20 splits # R2 for valence and arousal as benchmarks @article{chaturvedi2021music, title={Music mood and human emotion recognition based on physiological signals: a systematic review}, author={Chaturvedi, Vybhav and Kaur, Arman Beer and Varshney, Vedansh and Garg, Anupam and Chhabra, Gurpal Singh and Kumar, Munish}, journal={Multimedia Systems}, pages={1--24}, year={2021}, publisher={Springer} } # @article{garg2022machine, title={Machine learning model for mapping of music mood and human emotion based on physiological signals}, author={Garg, Anupam and Chaturvedi, Vybhav and Kaur, Arman Beer and Varshney, Vedansh and Parashar, Anshu}, journal={Multimedia Tools and Applications}, pages={1--41}, year={2022}, publisher={Springer} } # Messy, imprecise, poorly-written study # Interesting idea, combine MIR and physiological signal anaysis to create adaptive mood-specific playlist # PMEmo dataset contains emotion annotations of 794 songs and the simultaneous electro- dermal activity (EDA) signals # “The MediaEval Database for Emotional Analysis of Music’’ contains arousal valence values of 1802 excerpts and full songs ``` ## Project Aim There are multiple mood categorization schemes as well as multiple mood classification approaches available. Using a transformer-based neural network for lyric mood classification bimodally with another ANN model of audio features to classify musical mood is a novel approach that we herein aim to investigate further. ## Choosing a Dataset We are going to have to play around with the available datasets and see which one works best for ANN. ## Results coming later in June 2022! ###### tags: `MIR` `NLP` `ANN` `BERT` `GPT` `ANN` `TEEP2022`