STFT === librosa: https://librosa.org/doc/0.8.0/generated/librosa.stft.html Basic: --- - **sample rate**: How many points in a second. - if sr=1000, the data can represent at most 500 Hz's sound. STFT Steps --- 1. Take n points as a **Frame** - ex. Each Frame has 512 points ``` | Frame 1 | |-----------| ^512 points^ ``` 2. The next Frame will has an offset from first one. - ex. ``` | Frame 1 | --------- ---| Frame 2 | ------ ------| Frame 3 | --- ``` 3. Do FFT at each Frame 4. Result: ``` Frequency | | | | F | F | | r | r | | a | a | | m | m | | e | e | | | | | 1 | 2 | | | | |_____|_____|____[Magnitude][M][M][M] -> represent time Figures: --- - Each Frame: ``` Magnitude | | | /\ | / \ | / \ |_/______\_______Frequency ``` - each_Frame.Transpose combine: ``` Frequency | | | | F | F | | r | r | | a | a | | m | m | | e | e | | | | | 1 | 2 | | | | |_____|_____|____[Magnitude][M][M][M] -> represent time ``` google colab --- reference: https://github.com/weichian0920/MFA_DAE/blob/main/src/utils/signalprocess.py https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9167389 google colab: https://colab.research.google.com/drive/1wsap4JQarLkTajfuQ2g4v7T-oA4_t-9x?usp=sharing