###### tags: `thesis` `concept` `jptw` # Related Work > Template: [HYPERION relative work summary](https://docs.google.com/document/d/1-E8e9RXjYNWJDk5nxT71ztduPAkMWjXmCueqi4vI89M/) ## Notes ### ==Overview== - [ ] **01** - **Title**: Indexing and Retrieval of Audio: A Survey - **Authors**: Guojun Lu - **Goals**: - **Note**: - **Weakness**: ### Historical Method > 對時間或spectrogram上面做處理 - [x] **02** - **Title**: Acoustic signal detection through the cross-correlation method in experiments with different signal to noise ratio and reverberation conditions - **Authors**: Adrián-Martínez et al. - **Goals**: Apply signal detection by using cross-correlation technique in noisy and reverberant environments. - **Note**: The technique is more favorable for broadband signals due to their narrower correlation peaks - **Weakness**: The technique performed well with acoustic transient signals in high signal to noise ratio. - :::danger noise robustness ::: --- - [x] **03** - **Title**: Content-Based Classification, Search, and Retrieval of Audio - **Authors**: Wold et al. - **Goals**: Estabish an audio analysis and retrieval engine based on perceptual and acoustical features - **Note**: extracted features in musical and acoustic content like pitch, amplitude and spectral changes. - **Weakness**: The current system could only support for short or single-gestalt, and single ensemble sound. - :::danger single sound ::: --- - [x] **04** - **Title**: Content-Based Sound Retrieval for Web Application - **Authors**: Wan, C. et al. - **Goals**: compared to Soundfisher b y by using fewer features and shortening the searching time - **Note**: compute the Euclidean distance between the sound files and the query example, and use KNN to classify audio into different subspace. - **Weakness**: The conceptual semantics features should be accessed in the audio retrieval system - :::danger semantic feature ::: ### Semantic Audio Retrieval - [x] **05** - **Title**: Semantic-Audio Retrieval - **Authors**: Slaney - **Goals**: Retrieval sound with words by using the linked models between acoustic and semantic space - **Note**: This paper only proved the concept of linking two space with hierarchical clustering without actual verificaion, but the precision of query is related to the interpolation of nodes between two spaces. - **Weakness**: Both acoustic and semantic features had to be take into account in audio retrieval. - :::danger acoustic-semantic feature ::: --- - [x] **06** - **Title**: Audio Information Retrieval Using Semantic Similarity - **Authors**: Barrington et al. - **Goals**: Improve the retrieval system by querying from the semantic features of audio examples. - **Note**: The models are learned from a database composed of audio tracks with associated text captions, - **Weakness**: Query-by-example tasks shows difficulties in strong similarities in different categories like livestock, dogs, and horses, and the results are bound to be poor without reliabel ground truth. - :::danger semantic similarity; reliable ground truth ::: --- - [ ] **07** - **Title**: Event-Based Video Retrieval Using Audio - **Authors**: Jin et al. - **Goals**: Use only audio data for multimedia event detection for video retrieval - **Note**: Several systems like GSV-SVM, MFCC K-means and ASR, based on Bag-of-Words training are presented in the paper. - **Weakness**: Other systems performed worse then data-driven system like GSV-SVM and MFCC K-means, and the primary limit factor for them is the availability of labeled training data. - :::danger labeled data ::: --- - [x] **08** - **Title**: Query-by-example Retrieval of Sound Events using an Integrated Similarity Measure of Content and Label - **Authors**: Mesaros et al. - **Goals**: This paper proposed a single similarity measurement combined with acoustic and semantic similarity. - **Note**: Improved the measurement between synonym labels with natural language processing and WordNet. - **Weakness**: ### ==Music Retrieval== > 為了解決noise的問題所提出的 - [ ] **09** - **Title**: A Survey on Query-by-Example based Music Information Retrieval - **Authors**: Borjian - **Goals**: Briefly overview on several QBE based MIR systems - **Note**: extracted time, spectral and even musical features from audio. - **Weakness**: comprehensive evaluation on commercial applications, difficulty of comparison in academic systems because of no existence of a unique definition, accuracy-time trade off and finally robustness against noise and other distortions. --- - [ ] **10** - **Title**: A Review of Audio Fingerprinting - **Authors**: Cano et al. - **Goals**: - **Note**: - **Weakness**: --- - [x] **11** - **Title**: A Query-by-Example Technique for Retrieving Cover Versions of Popular Songs with Similar Melodies - **Authors**: Tsai et al. - **Goals**: This paper presented a method to retrieve songs covered in different languages, and genres with the similar melodies in vocal - **Note**: Remove the non-vocal part and comparing the similarities between notes of main melody in converted MIDI files. - **Weakness**: Due to the loss information in converting to MIDI, this solution could only be the baseline of retrieving covered songs. --- - [ ] **12** - **Title**: Audio Query by Example Using SimilarityMeasures between Probability Density Functions of Features - **Authors**: Hel´en and Virtanen - **Goals**: - **Note**: - **Weakness**: --- - [x] **13** - **Title**: A Similarity Measure for Audio Query by Example Based on Perceptual Coding and Compression - **Authors**: Hel´en and Virtanen - **Goals**: This paper proposed the compression-based method for retrieving audio. - **Note**: The methods were the Euclidean distance between the GMM densities - **Weakness**: --- - [ ] **14** - **Title**: A Review of Audio Fingerprinting - **Authors**: Cano et al. - **Goals**: - **Note**: - **Weakness**: --- - [ ] **15** - **Title**: Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications - **Authors**: Chandrasekhar et al. - **Goals**: - **Note**: - **Weakness**: **Audio Fingerprinting** - [x] **16** - **Title**: An Industrial-Strength Audio Search Algorithm - **Authors**: Wang - **Goals**: - **Note**: Shazam Fingerprinting - **Weakness**: --- - [x] **17** - **Title**: A Highly Robust Audio Fingerprinting System - **Authors**: Haitsma and Kalker - **Goals**: - **Note**: Philips Fingerprinting - **Weakness**: ### ~~Speech Recognition~~ > 人聲之餘音樂處理有不同之處 - - **Title**: Speech Recognition by Machine: A Review - **Authors**: Reddy - **Goals**: This paper reviewed speech recognition teniques in 1990s in aspects of acoustic, phonetic, syntactic and semantic levels. - **Note**: Give an overview on systems of WRS, CSR and SUS, signal processing in parametric analyses, end-point detection and noise normalization, and tasks for phonemic labeling, phonological rules, prosodics, word hypothesis, and word verification. Note that phonemic labeling consists of feature extraction and segmentation. - **Weakness**: This is an review paper so that the weakness of each system should be organized. --- - - **Title**: Speech Recognition Systems: A Comparative Review - **Authors**: Matarneh et al. - **Goals**: Compare the performance of the existing speech recognition systems - **Note**: - **Weakness**: --- - - **Title**: A Survey: Speech Recognition Approaches and Techniques - **Authors**: Singh et al. - **Goals**: Survey on techniques and approaches of voice recogntion and communication. - **Note**: Point out the classification basis on utterance and the approaches in acoustic phoneic, pattern recognition, and artificial intelligence. - **Weakness**: Efficient representing, storing, and retrieving knowledge required for natural conversation are needed for better speech recognition system. ### ==Sound Event Detection== > DCASE, fully labeled or weakly labeled - [ ] **18** - **Title**: TUT Database for Acoustic Scene Classification and Sound - **Authors**: Mesaros et al. - **Goals**: This paper presents the recording and annotation procedure of the TUT database. - **Note**: High quality binural audio recording and polyphonic annotation - **Weakness**: This is a introduction for database. --- - [x] **19** - **Title**: Sound event detection using spatial features and convolutional recurrent neural networkEvent Detection - **Authors**: Adavanne et al. - **Goals**: This paper presents a SED method using CRNN models with low-level spatial features. - **Note**: TUT-SED 2016 dataset provides the training data and ground truth for evalutaion - **Weakness**: The model learning based on fully-supervised techniques with strongly labeled data. --- - [x] **20** - **Title**: Audio Event Detection using Weakly Labeled Data - **Authors**: Kumar and Raj - **Goals**: The authors proposed a framework for learning detectors using only weakly labeled data - **Note**: - **Weakness**: :::success Bridging the gap between research and practice :::