# INTRODUCTION 1. first, they are capable of capturing energymodulation patterns across time and frequency when applied to spectrogram-like inputs 3. Second, by using convolutional kernels (filters) with a small receptive field, the network should, in principle, be able to successfully learn and later identify spectro-temporal patterns that are representative of different sound classes even if part of the sound is masked (in time/frequency) by other sources (noise 5. over fitting 的問題不能用 data augmentation 的方式解決(spectrogram-like限制) ![](https://hackmd.io/_uploads/Hkfv5O4eT.png) # METHOD 構造 1. three convolutional layers 2. interleaved with two pooling operations, 3. followed by two fully connected (dense) layers. 音訊處理 1. [log-scaled mel-spectrograms](https://blog.csdn.net/bo17244504/article/details/124707265)    |   [code](https://towardsdatascience.com/getting-to-know-the-mel-spectrogram-31bca3e2d9d0) (實際上要用 mel-spectrograms 或 spectrograms 可以看結果決定) 2.