owned this note
owned this note
Published
Linked with GitHub
###### tags:`MIR`,`NTHU`,`music-information-retrieval`,`key-detection`,`Krumhansl-Schmuckler`
MIR HW1
==
[toc]
## Environment
- Ubuntu 18.04.1 #5.3.0-46-generic
- Python 3.6.9 (using NeoVim v0.3.8)
- Extra modules: see below `Prerequisite` section
## Prerequisite
### Libraries
- [librosa](https://librosa.github.io/librosa/)
- [pretty-midi](https://craffel.github.io/pretty-midi/)
- [mir_eval](https://craffel.github.io/mir_eval/)
- Numpy
- Scipy
### Data Set
- GTZAN
- [Data Set](https://drive.google.com/drive/folders/1Xy1AIWa4FifDF6voKVutvmghGOGEsFdZ)
- [Annotation](https://github.com/alexanderlerch/gtzan_key)
- [BFS-FH](https://drive.google.com/drive/folders/1gEV87HsdM_4K1EuaOEDjs2yL5-G37UvJ?usp=drive_open)
- [GiantSteps](https://github.com/GiantSteps/giantsteps-key-dataset)
- [A-Maps](https://drive.google.com/drive/folders/1IKMUAqsLTy8sBmbCaaDiK9yWowlHLI8z)
## Questoin 1
- Tasks
1. Find the tonic note.
2. Find major/minor mode.
- Discuss Results:
- 高分者: 在`pop`和`rock`這兩個音樂風格上的表現較佳,可能原因爲較常使用一般的七聲音階及大小調的樂理技巧在作曲。
- 低分者: `blues`和`hiphop`使用特定的`Hexatonic`及`Heptatonic`,如此造成學習模型對於此兩個風格容易發生 perfect-fifth error,而編程方法上不一也是造成預測效果低下的原因之一。
### GTZAN
```shell
***** Q1-GTZAN *****
Genre accuracy
metal 24.73%
blues 7.14%
country 32.32%
hiphop 13.58%
rock 34.69%
reggae 32.99%
disco 31.63%
jazz 16.46%
pop 41.49%
----------
Overall accuracy: 26.52%
```

### GiantSteps
```shell
***** Q1-GiantSteps *****
----------
Overall accuracy: 16.39%
```

## Question 2
- Task
1. Repeat Q1 with factor of logarithmic compression `γ`, with `γ` = 1, 10, 100, 1000。
- Discuss Results:
- 將原本學習模型加入`γ`,做線性調整,此模型的 Overall accuracy 隨著`γ`遞增而遞減。`γ`作爲模型的超參數除了必須藉由實驗的不斷調整外,更需要根據樂曲型態而定,如:Q1中表現較好的`pop`在加入`γ`後有明顯幅度地下降。
### GTZAN
```shell
gamma 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:30<00:00, 11.01it/s]
***** Q2 *****
Genre accuracy
metal 21.51%
blues 7.14%
country 34.34%
hiphop 13.58%
rock 33.67%
reggae 31.96%
disco 32.65%
jazz 16.46%
pop 40.43%
----------
Overall accuracy: 26.16%
gamma 10
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:27<00:00, 11.40it/s]
***** Q2 *****
Genre accuracy
metal 20.43%
blues 5.61%
country 32.83%
hiphop 14.20%
rock 32.14%
reggae 30.41%
disco 30.61%
jazz 13.92%
pop 39.89%
----------
Overall accuracy: 24.85%
gamma 100
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:27<00:00, 11.41it/s]
***** Q2 *****
Genre accuracy
metal 20.07%
blues 5.10%
country 32.32%
hiphop 12.76%
rock 30.61%
reggae 28.52%
disco 29.93%
jazz 13.92%
pop 37.94%
----------
Overall accuracy: 23.86%
gamma 1000
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:27<00:00, 11.46it/s]
***** Q2 *****
Genre accuracy
metal 19.89%
blues 5.10%
country 32.07%
hiphop 11.73%
rock 30.10%
reggae 27.84%
disco 29.34%
jazz 13.92%
pop 36.70%
----------
Overall accuracy: 23.36%
```

### GiantSteps
- Task
1. Repeat Q1 with factor of logarithmic compression `γ`, with `γ` = 1, 10, 100, 1000.
```shell
gamma 1
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 604/604 [06:06<00:00, 1.65it/s]
***** Q1-GiantSteps *****
----------
Overall accuracy: 16.06%
gamma 10
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 604/604 [06:08<00:00, 1.64it/s]
***** Q1-GiantSteps *****
----------
Overall accuracy: 15.56%
gamma 100
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 604/604 [06:09<00:00, 1.64it/s]
***** Q1-GiantSteps *****
----------
Overall accuracy: 15.34%
gamma 1000
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 604/604 [06:14<00:00, 1.61it/s]
***** Q1-GiantSteps *****
----------
Overall accuracy: 15.23%
```

## Question 3
- Problem
- Some of the error detection results behave similarly, for example:

- Task
1. Use the scoring rule of MIREX key detection to fix the perfect-key error issue.

- Discuss Results:
- 因為透過關係大小調、同名大小調的「權重」上考量,結果相較原先Q1、Q2有所提升。此方法避免了perfect-key error,有效提升準確率。
### GTZAN
```shell
***** Q3 *****
Genre accuracy
metal 35.38%
blues 19.59%
country 54.85%
hiphop 20.25%
rock 49.29%
reggae 47.84%
disco 49.49%
classical 0.00%
jazz 31.90%
pop 56.60%
----------
Overall accuracy: 41.15%
```

### GiantSteps
```shell
***** Q3 *****
----------
Overall accuracy: 35.17%
```

- Discuss Results:
## Question 4
- Task:
- Use Krumhansl-Schmuckler’s method instead of Binary template and do the same actions of Q1-Q3.
- Discuss Results:
- KS template忠實呈現大小調音符的分佈,其原因可能爲:原本的binary template皆爲0、1所組成其所能表示的資訊有限,而KS所能呈現的結果較爲廣泛且細膩。
- 隨著gamma參數的設定,跑出的結果有一些變動起為非線性校正上的重要意涵,對於預測準確率有顯著的提昇效果。
- 從資料集方向來看,在7聲大調音階中,主音和第五音的組合是最爲和諧的,這兩個音的比例應該要占的比其他組合高。
### GTZAN
```shell
***** Q4 *****
Genre accuracy
metal 22.58%
blues 14.29%
country 49.49%
hiphop 16.05%
rock 39.80%
reggae 46.39%
disco 32.65%
jazz 27.85%
pop 54.26%
----------
Overall accuracy: 34.17%
```

### GiantSteps
```shell
***** Q4_1-GiantSteps *****
----------
Overall accuracy: 22.68%
```

### Q4-2
```shell
gamma 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:30<00:00, 11.10it/s]
***** Q4-2 *****
Genre accuracy
metal 22.58%
blues 13.27%
country 47.47%
hiphop 14.81%
rock 38.78%
reggae 44.33%
disco 32.65%
jazz 25.32%
pop 51.06%
----------
Overall accuracy: 32.74%
gamma 10
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:31<00:00, 10.89it/s]
***** Q4-2 *****
Genre accuracy
metal 21.51%
blues 13.27%
country 45.96%
hiphop 16.67%
rock 38.27%
reggae 41.75%
disco 30.61%
jazz 24.68%
pop 50.00%
----------
Overall accuracy: 31.84%
gamma 100
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:35<00:00, 10.49it/s]
***** Q4-2 *****
Genre accuracy
metal 21.15%
blues 13.61%
country 44.78%
hiphop 16.87%
rock 37.41%
reggae 38.14%
disco 28.91%
jazz 22.36%
pop 49.65%
----------
Overall accuracy: 30.74%
gamma 1000
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:32<00:00, 10.85it/s]
***** Q4-2 *****
Genre accuracy
metal 20.97%
blues 13.52%
country 43.69%
hiphop 16.98%
rock 36.73%
reggae 36.34%
disco 27.81%
jazz 21.20%
pop 49.47%
----------
Overall accuracy: 30.05%
```

### Q4-3
```shell
gamma 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:33<00:00, 10.74it/s]
***** Q4-3 *****
Genre accuracy
metal 35.48%
blues 23.78%
country 66.26%
hiphop 23.21%
rock 52.55%
reggae 55.77%
disco 48.37%
jazz 36.46%
pop 61.70%
----------
Overall accuracy: 45.46%
gamma 10
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:32<00:00, 10.78it/s]
***** Q4-3 *****
Genre accuracy
metal 33.60%
blues 23.72%
country 64.49%
hiphop 23.89%
rock 52.14%
reggae 53.61%
disco 46.17%
jazz 35.89%
pop 60.85%
----------
Overall accuracy: 44.40%
gamma 100
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:32<00:00, 10.82it/s]
***** Q4-3 *****
Genre accuracy
metal 32.72%
blues 23.71%
country 63.47%
hiphop 23.70%
rock 51.02%
reggae 50.41%
disco 44.90%
jazz 33.80%
pop 60.00%
----------
Overall accuracy: 43.21%
gamma 1000
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:31<00:00, 10.90it/s]
***** Q4-3 *****
Genre accuracy
metal 32.15%
blues 23.34%
country 62.45%
hiphop 23.61%
rock 50.20%
reggae 48.81%
disco 43.95%
jazz 32.82%
pop 59.63%
----------
Overall accuracy: 42.45%
```

### BoNus
- Discussions:
- 由結果可見,acc_STFT>acc_CQT>acc_CENS,準確率最高的是STFT,其物理含義也十分明确:就是把時間信號轉換為時間-頻率的信號,根據FFT的窗長和選擇的窗函數來決定時間-頻率分辨率的tradeoff每一段時間內的頻率成分。
- Constant Q Transform (CQT) 使用FFT進行頻率分析中的頻率是線性的,但人耳感知頻率卻不是線性的,也就是說人耳對某些頻段要比其他頻段敏感:對頻率的感知敏感度大體符合一個log normal distribution;對比較低頻率的敏感度(或者區分度)比較高,越高頻率敏感率越低,CQT就是按照這個原理在FFT的基礎上進行對數壓縮,使結果更逼近人耳的感知。
- CENS特征(chroma CENS)是在該變體中,對constant-Q chromagram執行額外的後處理步驟,以獲得動態和音色不變的特征。CENS特征的主要思想是在大的窗口上進行統計,平滑在節奏、發音和音樂裝飾物(如顫音和琶音和弦)上的局部偏差。因此,CENS特性對于音頻匹配和檢索等應用程序非常有用。
- Q4-1-1-cqt

- Q4-1-1-cens

- Q4-1-2-cqt

- Q4-1-2-cens

- Q4-2-1-cqt


- Q4-2-1-cens


- Q4-2-2-cqt

- Q4-2-2-cens

- Q4-3-1-cqt

- Q4-3-1-cens

- Q4-3-2-cqt

- Q4-3-2-cens

## Question 5
- Tasks
- Detecte the local keys over the time stamps of a music piece instead of finding a global key of a music piece, ouput the local key every second.
- Work on two datasets:
1. [BFS-FH](https://drive.google.com/drive/folders/1gEV87HsdM_4K1EuaOEDjs2yL5-G37UvJ?usp=drive_open)
2. [A-Maps](https://drive.google.com/drive/folders/1IKMUAqsLTy8sBmbCaaDiK9yWowlHLI8z)
- Discussions:
- 使用KS-profile在加上average filtering(windows size〜= 64s),最終達到了60% overall accuracy。
```shell
3.wav 35.761% 57.714%
6.wav 29.332% 49.047%
11.wav 65.760% 72.298%
12.wav 26.140% 55.684%
13.wav 36.923% 48.462%
14.wav 66.667% 73.152%
16.wav 27.011% 42.874%
18.wav 75.710% 79.794%
19.wav 67.483% 74.580%
21.wav 47.120% 55.909%
22.wav 66.953% 74.464%
24.wav 56.242% 62.257%
25.wav 43.088% 61.176%
28.wav 55.556% 65.131%
31.wav 65.896% 69.595%
32.wav 62.679% 71.615%
***** Q5-BPS-FH *****
OverallAcc:
acc1: 50.148%
acc2: 62.230%
```