STFT
===
librosa: https://librosa.org/doc/0.8.0/generated/librosa.stft.html
Basic:
---
- **sample rate**: How many points in a second.
- if sr=1000, the data can represent at most 500 Hz's sound.
STFT Steps
---
1. Take n points as a **Frame**
- ex. Each Frame has 512 points
```
| Frame 1 |
|-----------|
^512 points^
```
2. The next Frame will has an offset from first one.
- ex.
```
| Frame 1 | ---------
---| Frame 2 | ------
------| Frame 3 | ---
```
3. Do FFT at each Frame
4. Result:
```
Frequency
| | |
| F | F |
| r | r |
| a | a |
| m | m |
| e | e |
| | |
| 1 | 2 |
| | |
|_____|_____|____[Magnitude][M][M][M] -> represent time
Figures:
---
- Each Frame:
```
Magnitude
|
|
| /\
| / \
| / \
|_/______\_______Frequency
```
- each_Frame.Transpose combine:
```
Frequency
| | |
| F | F |
| r | r |
| a | a |
| m | m |
| e | e |
| | |
| 1 | 2 |
| | |
|_____|_____|____[Magnitude][M][M][M] -> represent time
```
google colab
---
reference:
https://github.com/weichian0920/MFA_DAE/blob/main/src/utils/signalprocess.py
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9167389
google colab:
https://colab.research.google.com/drive/1wsap4JQarLkTajfuQ2g4v7T-oA4_t-9x?usp=sharing