# INTRODUCTION
1. first, they are capable of capturing energymodulation patterns across time and frequency when applied to spectrogram-like inputs
3. Second, by using convolutional kernels (filters) with a small receptive field, the network should, in principle, be able to successfully learn and later identify spectro-temporal patterns that are representative of different sound classes even if part of the sound is masked (in time/frequency) by other sources (noise
5. over fitting 的問題不能用 data augmentation 的方式解決(spectrogram-like限制)

# METHOD
構造
1. three convolutional layers
2. interleaved with two pooling operations,
3. followed by two fully connected (dense) layers.
音訊處理
1. [log-scaled mel-spectrograms](https://blog.csdn.net/bo17244504/article/details/124707265) | [code](https://towardsdatascience.com/getting-to-know-the-mel-spectrogram-31bca3e2d9d0)
(實際上要用 mel-spectrograms 或 spectrograms 可以看結果決定)
2.