# [SOTA] music classification ## Objectifs - Labellisation automatique interprétable côté métier: - Tempo - Style - Emotion - Data = connu, stock, compo... - Durée courte ? ## Littérature Mood classification, genre classification = champs de recherches connus. **Mood classification** - label plus subjectif - très lié au rythme (ref. dans [1]) - Relation quasi linéaire avec mode>tempo>register>dynamics>articulation>timbre [2] - 0.7 AUC **Genre classification** - most widely studied area. - 0.9 accuracy ## Méthodes ### Feature-engineering Historique. Familles : timbre, temporal, rythm, pitch, harmony [1]. Utiliser librosa. Features acoustic VS features visual (sur spectrogram). Visual>acoustic? [4] SVM donne les meilleurs résultats en classif. ### Deep learning Utilisent Mel-spectrograms. - ![](https://i.imgur.com/WcFzPua.png) Choix pas si crucial. #### 1D vs 2D - Musique = invariance domaine du temps > invariance domaine des fréquences. - Archi 2D beaucoup plus lourdes (poids et calculs) [5]. - 2D légèrement au dessus de 1D sur les perfs si data et hardware [5]. #### Domaine: temps/fréquence ou temps Sur le temps : SampleCNN [6]. Pas intéressant en terme de perfs. #### Techniques avancées/tricks - On peut downscaler à 8kHz [6], 3s/6s de signal en général [5]. - On peut faire du transfer [6] (+ ref. dans [5]) - "Squeeze-and-excitation" ? (ref. dans [5]) - "attention mechanism" a bien marché sur mood (ref. dans [5]) - residual connections - data augmentation => muda.readthedocs.io - hand-crafted features + learned features? [4] ## Databases - MTAT => 26k, genre, mood, instrument etc. Style des musiques est apparemment pas très représentatif. Problème pour transfer ? - MSD => 1M, "song level tags", genre + last.fm tag annotation. 200K tags (mood?). - FMA => 100k. Genre principalement, tags. No mood? - https://artlist.io/ => mood, video theme, genre, instrument. Scraping ? Manuel ? ## Pistes choisies 1) FE+ML [7][8][9] 2) CNN [10] ## Références [1] Fu, Z., Lu, G., Ting, K. M., & Zhang, D. (2010). A survey of audio-based music classification and annotation. IEEE transactions on multimedia, 13(2), 303-319. [2] Eerola, T., Friberg, A., & Bresin, R. (2013). Emotional expression in music: contribution, linearity, and additivity of primary musical cues. Frontiers in psychology, 4, 487. [3] Huang, Y. S., Chou, S. Y., & Yang, Y. H. (2017, December). Music thumbnailing via neural attention modeling of music emotion. In 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 347-350). IEEE. [4] Costa, Y. M., Oliveira, L. S., & Silla Jr, C. N. (2017). An evaluation of convolutional neural networks for music classification using spectrograms. Applied soft computing, 52, 28-38. [5] Nam, J., Choi, K., Lee, J., Chou, S. Y., & Yang, Y. H. (2018). Deep learning for audio-based music classification and tagging: Teaching computers to distinguish rock from bach. IEEE signal processing magazine, 36(1), 41-51. [6] Lee, J., Park, J., Kim, K. L., & Nam, J. (2018). Samplecnn: End-to-end deep convolutional neural networks using very small filters for music classification. Applied Sciences, 8(1), 150. [7] Sharma, S., Fulzele, P., & Sreedevi, I. (2018, April). Novel hybrid model for music genre classification based on support vector machine. In 2018 IEEE symposium on computer applications & industrial electronics (ISCAIE) (pp. 395-400). IEEE. [8] Bahuleyan, H. (2018). Music genre classification using machine learning techniques. arXiv preprint arXiv:1804.01149. [9] Elbir, A., Çam, H. B., Iyican, M. E., Öztürk, B., & Aydin, N. (2018, October). Music Genre Classification and Recommendation by Using Machine Learning Techniques. In 2018 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-5). IEEE. [10] Choi, K., Fazekas, G., & Sandler, M. (2016). Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298.