---
title: 'Emotion Classification 2022'
disqus: hackmd
---
###### tags:`TEEP 2022`
## Emotion Classification
[TOC]
## Need for Study
***iTunes size argument**
https://en.wikipedia.org/wiki/ITunes_Store
Apple iTunes is a prominent digital music market that opened in 2003 and hosted approx. 20 million songs as of 2020.
**Lee 2010**
Challenges in MIR include lack of formal music education among users, seeking music from different cultures, instrumental music lacks searchable lyrics, and even songs with lyrics are rarely comprehended perfectly.
## Literature Review
---
The sources summarized below are organized in chronological order and pertain to the task of music mood classification.
The studies tend to focus on one of two broad topics: **1)** How to categorize mood as experienced by humans listening to music (henceforth *mood categorization*), and **2)** How to extract and model musical features in a way that agrees with a given categorization (henceforth *mood classification*).
### Mood Categorization:
**Hevner 1936**

**General Inquirer**
Early attempt to objectively generate psychological state tags from a large natural text corpus.
**Russell 1980**

**Affective Norms for English Words (ANEW)**
1,000+ English words with scores obtained from human questionnaires.
3 dimensions: valence, arousal, and dominance

**WordNet-Affect**
A semantically grouped list of 4787 affect words grouped in 2,874 synsets. Made by using human-written list of 1903 affect words as cores in WordNet to get synsets.

**Hu 2010 - "Lyrics":**
18 categories based on Russell and last.fm

### Classification Approach:
**Laurier 2008:**
bimodal approach (audio+lyrics)
audio: SVM, Random Forest, Logistic Regression
lyrics: k-nearest neighbor on a Bag Of Words, Latent Semantic Analysis, Language Model Difference
Bimodal combination methods: voting system or mixed feature space (audio and lyric data in same vector)
**Hu 2010 - "Improving":**
SVM
### Musical Features:
**Hevner 1936:**

**Laurier 2008:**
timbral (MFCC, spectral centroid); rhythmic (tempo, onset); tonal (Harmonic Pitch); 'temporal'
**Hu 2010 - "Improving"**
MARSYAS (toolbox like Librosa) features - 63 spectral features: means and variances of Spectral Centroid, Rolloff, Flux, Mel-Frequency Cepstral Coefficients (MFCC), etc.
### Datasets:
**Laurier 2008:**
Last.fm (mood tags)
**CAL500**
Human-labeled random set of 500 Western modern songs labeled for 18 emotions on 1-3 scale.
**Million Song Dataset (MSD)**
Songs from 1922 to present, no music included, signal data, metadata, no lyrics
uses Echo Nest features
**musiXmatch**
The musiXmatch dataset 10 includes 237, 662 tracks of the MSD plus lyrics
http://labrosa.ee.columbia.edu/millionsong/musixmatch
### Chronological BibTex references with annotations:
```gherkin=
@article{hevner1936experimental,
title={Experimental studies of the elements of expression in music},
author={Hevner, Kate},
journal={The American Journal of Psychology},
volume={48},
number={2},
pages={246--268},
year={1936},
publisher={JSTOR}
}
# American study from the 1930s from an old experimental psychology approach.
# "There are many ways to enjoy music, and many types of music to enjoy."
# Musicl training is not needed to distinguish mood.
# 8 mood categories: happy, graceful, serene, dreamy, sad, dignified, vigorous, exciting
# 4 music featuers:
# melodic direction (ascending vs. descending)
# harmony complexity (simple vs. complex)
# Mode (major vs. minor)
# Rhythm (firm vs. flowing)
# Study only considered classical music.
@article{stone1966general,
title={The general inquirer: A computer approach to content analysis.},
author={Stone, Philip J and Dunphy, Dexter C and Smith, Marshall S},
year={1966},
publisher={MIT press}
}
# Old paper is hard to understand
# It's a mapping tool?
# Modern explanation: (https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjkwNPNqJD4AhXBed4KHTieDlMQFnoECAUQAQ&url=https%3A%2F%2Fera.library.ualberta.ca%2Fitems%2F0e790a8e-a263-4a99-9a77-418d91b700c0%2Fview%2F1822ecf2-7f28-47d0-a08f-95ceaf8f92af%2FBeginnings_of_Content_Analysis.Revised.pdf&usg=AOvVaw2iPRWRh1_eKdAAcqpxVjtk)
# Spreadsheet example: http://www.wjh.harvard.edu/~inquirer/spreadsheet_guide.htm
@article{russell1980circumplex,
title={A circumplex model of affect.},
author={Russell, James A},
journal={Journal of personality and social psychology},
volume={39},
number={6},
pages={1161},
year={1980},
publisher={American Psychological Association}
}
# Psychology study from the 1980s
# Proposes a mood categorization scheme using the term "affect."
# 360 degree Affect wheel with sleep-arousal on the Y axis and (dis)pleasure on the X axis.
# study aims to simplify 6-axis and 12-axis affect models
# Participants given a list of 28 words and asked to classify them on the circle.
# Used principle component analysis to validate his scheme.
@techreport{bradley1999affective,
title={Affective norms for English words (ANEW): Instruction manual and affective ratings},
author={Bradley, Margaret M and Lang, Peter J},
year={1999},
institution={Technical report C-1, the center for research in psychophysiology~…}
}
# three dimensions of pleasure, arousal, and dominance
@article{peretz2004singing,
title={Singing in the brain: Insights from cognitive neuropsychology},
author={Peretz, Isabelle and Gagnon, Lise and H{\'E}{\'E}BERT, SYLVIE and MACOIR, JO{\"E}{\"E}L},
journal={Music Perception},
volume={21},
number={3},
pages={373--390},
year={2004},
publisher={University of California Press . USA}
}
# Foundational study in arguing for bimodal approach
# Case study of "aphasia without amusia" patient who cannot speak but can produce music.
# Music and speech are processed in different regions of the brain.
# Humans understand song mood from both lyrics and audio
# Relevance: justifies use bimodal approach
@inproceedings{strapparava2004wordnet,
title={Wordnet affect: an affective extension of wordnet.},
author={Strapparava, Carlo and Valitutti, Alessandro and others},
booktitle={Lrec},
volume={4},
number={1083-1086},
pages={40},
year={2004},
organization={Lisbon, Portugal}
}
# created subgroup of WordNet that deals with affect words
# Name: WordNet-Affect, part of larger WordNet-DOMAIN project
# synset = synonym set
# manually compiled list of 1903 affect words
# affect words used as synset cores
@inproceedings{turnbull2007towards,
title={Towards musical query-by-semantic-description using the cal500 data set},
author={Turnbull, Douglas and Barrington, Luke and Torres, David and Lanckriet, Gert},
booktitle={Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval},
pages={439--446},
year={2007}
}
# Download dataset from [cosmal.ucsd.edu/cal](http://calab1.ucsd.edu/~datasets/cal500/)
# human-labeled random set of 500 Western modern songs
# Labeled for 18 emotions on 1-3 scale
@inproceedings{laurier2008multimodal,
title={Multimodal music mood classification using audio and lyrics},
author={Laurier, Cyril and Grivolla, Jens and Herrera, Perfecto},
booktitle={2008 seventh international conference on machine learning and applications},
pages={688--693},
year={2008},
organization={IEEE}
}
# Bimodal approach: lyrics + music features.
# Adapts Russell's binary categories:
# low-high arousal (Russell's sleep-arousal)
# positive-negative valence (Russell's pleasure-displeasure)
# Used last.fm tags to set up mood ground truths (plus 17 human validators)
# Features:
# timbral (MFCC, spectral centroid)
# rhythmic (tempo, onset rate)
# tonal (like Harmonic Pitch Class Profiles)
# temporal (authors not specific about features in this paper)
# Bimodal approach gave best mood classification accuracy.
@inproceedings{laurier2009music,
title={Music Mood Representations from Social Tags.},
author={Laurier, Cyril and Sordo, Mohamed and Serra, Joan and Herrera, Perfecto},
booktitle={ISMIR},
pages={381--386},
year={2009}
}
# More work validating the arousal-valence model.
# Identified mood tags in last.fm>LSA to cluster tags>compare with Hevner, MIREX, and Russell
# Conclusion: experts and last.fm users agree on the arousal-valence model.
@inproceedings{hu2010improving,
title={Improving mood classification in music digital libraries by combining lyrics and audio},
author={Hu, Xiao and Downie, J Stephen},
booktitle={Proceedings of the 10th annual joint conference on Digital libraries},
pages={159--168},
year={2010}
}
# Bimodal mood classification
# Russell scheme
# SVM model
@inproceedings{hu2010lyrics,
title={When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis.},
author={Hu, Xiao and Downie, J Stephen},
booktitle={ISMIR},
pages={619--624},
year={2010},
organization={Citeseer}
}
# ground truths from last.fm tags
# Lyrics outperform audio features
# Quote: one single song could
# belong to multiple mood
# categories. This is in
# fact more realistic
# than a single-label
# setting since a music
# piece may carry multiple
# moods such as “happy and
# calm” or “aggressive
# and depressed”.
@article{lee2010analysis,
title={Analysis of user needs and information features in natural language queries seeking music information},
author={Lee, Jin Ha},
journal={Journal of the American Society for Information Science and Technology},
volume={61},
number={5},
pages={1025--1045},
year={2010},
publisher={Wiley Online Library}
}
# General field of information seeking and retrieval:
# Bates’ (1989) Berrypicking model
# Kuhlthau’s (1991) Information Search Process model
# Dervin’s (1992) sense-making model
# Savolainen’s (1995) Every- day Life Information Seeking (ELIS) model
# Ingwersen’s (1992) cognitive mode
# Wilson’s (1999) model of information seeking
# Challenges in MIR:
# people who had no formal music education
# people who seek music from different cultures
# not all musical works are accom- panied by lyrics
# often difficult to comprehend the lyrics
@inproceedings{tsunoo2010music,
title={Music mood classification by rhythm and bass-line unit pattern analysis},
author={Tsunoo, Emiru and Akase, Taichi and Ono, Nobutaka and Sagayama, Shigeki},
booktitle={2010 IEEE International Conference on Acoustics, Speech and Signal Processing},
pages={265--268},
year={2010},
organization={IEEE}
}
# Japanese study, terrible English. No proofreader?
# Along with Ono, they push their rhythm and base-line approach
# harmonic/percussive sound separation (HPSS) technique pro- posed by Ono
# Used Weka suite
# SVM? unclear
@article{bertin2011million,
title={The million song dataset},
author={Bertin-Mahieux, Thierry and Ellis, Daniel PW and Whitman, Brian and Lamere, Paul},
year={2011}
}
#Details on the MSD
@inproceedings{schuller2011multi,
title={Multi-modal non-prototypical music mood analysis in continuous space: realiability and performances},
author={Schuller, Bj{\"o}rn and Weninger, Felix and Dorfner, Johannes},
booktitle={Proc. 12th Intern. Society for Music Information Retrieval Conference (ISMIR) 2011, ISMIR, Miami, FL, USA},
year={2011}
}
#
@article{brinker2012expressed,
title={Expressed music mood classification compared with valence and arousal ratings},
author={Brinker, Bert den and Dinther, Ralph van and Skowronek, Janto},
journal={EURASIP Journal on Audio, Speech, and Music Processing},
volume={2012},
number={1},
pages={1--14},
year={2012},
publisher={SpringerOpen}
}
#
@inproceedings{weninger2014line,
title={On-line continuous-time music mood regression with deep recurrent neural networks},
author={Weninger, Felix and Eyben, Florian and Schuller, Bj{\"o}rn},
booktitle={2014 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
pages={5412--5416},
year={2014},
organization={IEEE}
}
#
@article{saari2015genre,
title={Genre-adaptive semantic computing and audio-based modelling for music mood annotation},
author={Saari, Pasi and Fazekas, Gy{\"o}rgy and Eerola, Tuomas and Barthet, Mathieu and Lartillot, Olivier and Sandler, Mark},
journal={IEEE Transactions on Affective Computing},
volume={7},
number={2},
pages={122--135},
year={2015},
publisher={IEEE}
}
#
@article{moon2015mood,
title={Mood lighting system reflecting music mood},
author={Moon, Chang Bae and Kim, HyunSoo and Lee, Dong Won and Kim, Byeong Man},
journal={Color Research \& Application},
volume={40},
number={2},
pages={201--212},
year={2015},
publisher={Wiley Online Library}
}
#
@article{plewa2015music,
title={Music mood visualization using self-organizing maps},
author={Plewa, Magdalena and Kostek, Bozena},
journal={Archives of Acoustics},
volume={40},
number={4},
pages={513--525},
year={2015}
}
#
@article{mo2017novel,
title={A novel method based on OMPGW method for feature extraction in automatic music mood classification},
author={Mo, Shasha and Niu, Jianwei},
journal={IEEE Transactions on Affective Computing},
volume={10},
number={3},
pages={313--324},
year={2017},
publisher={IEEE}
}
#
@inproceedings{ccano2017music,
title={Music mood dataset creation based on last. fm tags},
author={{\c{C}}ano, Erion and Morisio, Maurizio and others},
booktitle={2017 International Conference on Artificial Intelligence and Applications, Vienna, Austria},
pages={15--26},
year={2017}
}
#
@article{delbouys2018music,
title={Music mood detection based on audio and lyrics with deep neural net},
author={Delbouys, R{\'e}mi and Hennequin, Romain and Piccoli, Francesco and Royo-Letelier, Jimena and Moussallam, Manuel},
journal={arXiv preprint arXiv:1809.07276},
year={2018}
}
# Audio-only CNN
# Lyrics: every model type except transformers
# MSD dataset
# 60:20:20 splits
# R2 for valence and arousal as benchmarks
@article{chaturvedi2021music,
title={Music mood and human emotion recognition based on physiological signals: a systematic review},
author={Chaturvedi, Vybhav and Kaur, Arman Beer and Varshney, Vedansh and Garg, Anupam and Chhabra, Gurpal Singh and Kumar, Munish},
journal={Multimedia Systems},
pages={1--24},
year={2021},
publisher={Springer}
}
#
@article{garg2022machine,
title={Machine learning model for mapping of music mood and human emotion based on physiological signals},
author={Garg, Anupam and Chaturvedi, Vybhav and Kaur, Arman Beer and Varshney, Vedansh and Parashar, Anshu},
journal={Multimedia Tools and Applications},
pages={1--41},
year={2022},
publisher={Springer}
}
# Messy, imprecise, poorly-written study
# Interesting idea, combine MIR and physiological signal anaysis to create adaptive mood-specific playlist
# PMEmo dataset contains emotion annotations of 794 songs and the simultaneous electro- dermal activity (EDA) signals
# “The MediaEval Database for Emotional Analysis of Music’’ contains arousal valence values of 1802 excerpts and full songs
```
## Project Aim
There are multiple mood categorization schemes as well as multiple mood classification approaches available.
Using a transformer-based neural network for lyric mood classification bimodally with another ANN model of audio features to classify musical mood is a novel approach that we herein aim to investigate further.
## Choosing a Dataset
We are going to have to play around with the available datasets and see which one works best for ANN.
## Results
coming later in June 2022!
###### tags: `MIR` `NLP` `ANN` `BERT` `GPT` `ANN` `TEEP2022`