Chapter 2 Related Work

###### tags: `draft` `thesis` `jptw` # Chapter 2 Related Work :::info Keywords | `audio retrieval` `acoustic feature` `semantic feature` `sound event detecion` ::: ### Historical Method Acoustic analysis and content-based audio retrieval in multimedia database has been highlighted topic for researchers for decades of years. To deal with signal processing in aspects of audio contents, the traditional and straightforward way is using cross-correlation technique. Adrián-Martínez et al. `[2]adrian2014acoustic` proposed a acoustic detection method through cross correlation and implemented their methods in real-life conditions with noisy and reverberant environment. In additiont to the time domain retrieval, some research group extracted the features from the spectrogram. Wold et al. `[3]wold1996content` built an audio retrieval system based on perceptual and acoustical features. Wan et al. `[4]` improved the computation and searching time with kNN clustering method to classify the features on time, frequency, and coefficient domain. These work indicated that the information in audio contained musical and acoustic contents, and even for semantic features. ### Music Information Retrieval Although in `[3][4]` they mentioned the musical features like pitch and melodies, their system could only processed with short, single-gestalt, or single ensemble sound. Therefore, some researchers dedicated in music information retrieval. According to `[5]borjian2017survey`, this paper briefly reviewed on several OBE-based MIR system. Tsai et al. `[6]tsai2005query` presented a method to retrieve covered songs with similar melodies in vocal. They remained the vocal portion and compared the similarities between notes of main melody from converted MIDI files. However, due to the loss information during the conversion, this method could only yield good performance on query-by-singing-example, instead of pure songs. To address this limitation, Hel´en and Virtanen `[7]helen2007similarity` proposed a method to improve the loss of comparing parameters. They used GMM and HMM models to code and compress audio signals and searched the database by different similarity measures. As well, some other solutions in spectrum domain were proposed. Shazam`[8]wang2003industrial` and Musiwave `[9]`, as well-known query-by-example systems, they extracted features from spectrogram to generate a sparse feature set, so-called audio fingerprinting, for the searching mechanism. In addition, They provided Application Programming Interface (API) for users to access the service through mobile and also revealed the noise robustness in retrieval. ### Semantic-based Audio Retrieval For semantic features, in `[4]slaney2002semantic` Slaney provided a method of query sound by linked model that clusters features between acoustic and semantic space. Barrington et al. `[5]barrington2007audio` mentioned different way in QBE, query-by-acoustic-example (QBAE) and query-by-semantic-example (QBSE), and showed the better performance with semantic similarities. However, the results showed the difficulties of QBE in strong similarities in different categories like livestock, dog and horse. Hence, Mesaros et al. `[6]mesaros2013query` proposed a single similarity measurement combined with acoustic and semantic similarity, and improved the measurement between synonym labels with natural language processing and WordNet`[7]pedersen2004wordnet`. ### ==Deep Learning Method==