MIR HW1 - HackMD

###### tags:`MIR`,`NTHU`,`music-information-retrieval`,`key-detection`,`Krumhansl-Schmuckler` MIR HW1 == [toc] ## Environment - Ubuntu 18.04.1 #5.3.0-46-generic - Python 3.6.9 (using NeoVim v0.3.8) - Extra modules: see below `Prerequisite` section ## Prerequisite ### Libraries - [librosa](https://librosa.github.io/librosa/) - [pretty-midi](https://craffel.github.io/pretty-midi/) - [mir_eval](https://craffel.github.io/mir_eval/) - Numpy - Scipy ### Data Set - GTZAN - [Data Set](https://drive.google.com/drive/folders/1Xy1AIWa4FifDF6voKVutvmghGOGEsFdZ) - [Annotation](https://github.com/alexanderlerch/gtzan_key) - [BFS-FH](https://drive.google.com/drive/folders/1gEV87HsdM_4K1EuaOEDjs2yL5-G37UvJ?usp=drive_open) - [GiantSteps](https://github.com/GiantSteps/giantsteps-key-dataset) - [A-Maps](https://drive.google.com/drive/folders/1IKMUAqsLTy8sBmbCaaDiK9yWowlHLI8z) ## Questoin 1 - Tasks 1. Find the tonic note. 2. Find major/minor mode. - Discuss Results: - 高分者：在`pop`和`rock`這兩個音樂風格上的表現較佳，可能原因爲較常使用一般的七聲音階及大小調的樂理技巧在作曲。 - 低分者： `blues`和`hiphop`使用特定的`Hexatonic`及`Heptatonic`，如此造成學習模型對於此兩個風格容易發生 perfect-fifth error，而編程方法上不一也是造成預測效果低下的原因之一。 ### GTZAN ```shell ***** Q1-GTZAN ***** Genre accuracy metal 24.73% blues 7.14% country 32.32% hiphop 13.58% rock 34.69% reggae 32.99% disco 31.63% jazz 16.46% pop 41.49% ---------- Overall accuracy: 26.52% ``` ![](https://i.imgur.com/t83go3h.png) ### GiantSteps ```shell ***** Q1-GiantSteps ***** ---------- Overall accuracy: 16.39% ``` ![](https://i.imgur.com/ivdqypj.png) ## Question 2 - Task 1. Repeat Q1 with factor of logarithmic compression `γ`, with `γ` = 1, 10, 100, 1000。 - Discuss Results: - 將原本學習模型加入`γ`，做線性調整，此模型的 Overall accuracy 隨著`γ`遞增而遞減。`γ`作爲模型的超參數除了必須藉由實驗的不斷調整外，更需要根據樂曲型態而定，如：Q1中表現較好的`pop`在加入`γ`後有明顯幅度地下降。 ### GTZAN ```shell gamma 1 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:30<00:00, 11.01it/s] ***** Q2 ***** Genre accuracy metal 21.51% blues 7.14% country 34.34% hiphop 13.58% rock 33.67% reggae 31.96% disco 32.65% jazz 16.46% pop 40.43% ---------- Overall accuracy: 26.16% gamma 10 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:27<00:00, 11.40it/s] ***** Q2 ***** Genre accuracy metal 20.43% blues 5.61% country 32.83% hiphop 14.20% rock 32.14% reggae 30.41% disco 30.61% jazz 13.92% pop 39.89% ---------- Overall accuracy: 24.85% gamma 100 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:27<00:00, 11.41it/s] ***** Q2 ***** Genre accuracy metal 20.07% blues 5.10% country 32.32% hiphop 12.76% rock 30.61% reggae 28.52% disco 29.93% jazz 13.92% pop 37.94% ---------- Overall accuracy: 23.86% gamma 1000 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:27<00:00, 11.46it/s] ***** Q2 ***** Genre accuracy metal 19.89% blues 5.10% country 32.07% hiphop 11.73% rock 30.10% reggae 27.84% disco 29.34% jazz 13.92% pop 36.70% ---------- Overall accuracy: 23.36% ``` ![](https://i.imgur.com/7NZb5nj.png) ### GiantSteps - Task 1. Repeat Q1 with factor of logarithmic compression `γ`, with `γ` = 1, 10, 100, 1000. ```shell gamma 1 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 604/604 [06:06<00:00, 1.65it/s] ***** Q1-GiantSteps ***** ---------- Overall accuracy: 16.06% gamma 10 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 604/604 [06:08<00:00, 1.64it/s] ***** Q1-GiantSteps ***** ---------- Overall accuracy: 15.56% gamma 100 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 604/604 [06:09<00:00, 1.64it/s] ***** Q1-GiantSteps ***** ---------- Overall accuracy: 15.34% gamma 1000 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 604/604 [06:14<00:00, 1.61it/s] ***** Q1-GiantSteps ***** ---------- Overall accuracy: 15.23% ``` ![](https://i.imgur.com/AbKGMiH.png) ## Question 3 - Problem - Some of the error detection results behave similarly, for example: ![](https://i.imgur.com/LYo6N0B.png) - Task 1. Use the scoring rule of MIREX key detection to fix the perfect-key error issue. ![](https://i.imgur.com/Q50zHWZ.png) - Discuss Results: - 因為透過關係大小調、同名大小調的「權重」上考量，結果相較原先Q1、Q2有所提升。此方法避免了perfect-key error，有效提升準確率。 ### GTZAN ```shell ***** Q3 ***** Genre accuracy metal 35.38% blues 19.59% country 54.85% hiphop 20.25% rock 49.29% reggae 47.84% disco 49.49% classical 0.00% jazz 31.90% pop 56.60% ---------- Overall accuracy: 41.15% ``` ![](https://i.imgur.com/aFom1qN.png) ### GiantSteps ```shell ***** Q3 ***** ---------- Overall accuracy: 35.17% ``` ![](https://i.imgur.com/UcbnzQD.png) - Discuss Results: ## Question 4 - Task: - Use Krumhansl-Schmuckler’s method instead of Binary template and do the same actions of Q1-Q3. - Discuss Results: - KS template忠實呈現大小調音符的分佈，其原因可能爲：原本的binary template皆爲0、1所組成其所能表示的資訊有限，而KS所能呈現的結果較爲廣泛且細膩。 - 隨著gamma參數的設定，跑出的結果有一些變動起為非線性校正上的重要意涵，對於預測準確率有顯著的提昇效果。 - 從資料集方向來看，在7聲大調音階中，主音和第五音的組合是最爲和諧的，這兩個音的比例應該要占的比其他組合高。 ### GTZAN ```shell ***** Q4 ***** Genre accuracy metal 22.58% blues 14.29% country 49.49% hiphop 16.05% rock 39.80% reggae 46.39% disco 32.65% jazz 27.85% pop 54.26% ---------- Overall accuracy: 34.17% ``` ![](https://i.imgur.com/YyWO3G4.png) ### GiantSteps ```shell ***** Q4_1-GiantSteps ***** ---------- Overall accuracy: 22.68% ``` ![](https://i.imgur.com/ge6VMbE.png) ### Q4-2 ```shell gamma 1 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:30<00:00, 11.10it/s] ***** Q4-2 ***** Genre accuracy metal 22.58% blues 13.27% country 47.47% hiphop 14.81% rock 38.78% reggae 44.33% disco 32.65% jazz 25.32% pop 51.06% ---------- Overall accuracy: 32.74% gamma 10 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:31<00:00, 10.89it/s] ***** Q4-2 ***** Genre accuracy metal 21.51% blues 13.27% country 45.96% hiphop 16.67% rock 38.27% reggae 41.75% disco 30.61% jazz 24.68% pop 50.00% ---------- Overall accuracy: 31.84% gamma 100 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:35<00:00, 10.49it/s] ***** Q4-2 ***** Genre accuracy metal 21.15% blues 13.61% country 44.78% hiphop 16.87% rock 37.41% reggae 38.14% disco 28.91% jazz 22.36% pop 49.65% ---------- Overall accuracy: 30.74% gamma 1000 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:32<00:00, 10.85it/s] ***** Q4-2 ***** Genre accuracy metal 20.97% blues 13.52% country 43.69% hiphop 16.98% rock 36.73% reggae 36.34% disco 27.81% jazz 21.20% pop 49.47% ---------- Overall accuracy: 30.05% ``` ![](https://i.imgur.com/pxRjfC0.png) ### Q4-3 ```shell gamma 1 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:33<00:00, 10.74it/s] ***** Q4-3 ***** Genre accuracy metal 35.48% blues 23.78% country 66.26% hiphop 23.21% rock 52.55% reggae 55.77% disco 48.37% jazz 36.46% pop 61.70% ---------- Overall accuracy: 45.46% gamma 10 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:32<00:00, 10.78it/s] ***** Q4-3 ***** Genre accuracy metal 33.60% blues 23.72% country 64.49% hiphop 23.89% rock 52.14% reggae 53.61% disco 46.17% jazz 35.89% pop 60.85% ---------- Overall accuracy: 44.40% gamma 100 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:32<00:00, 10.82it/s] ***** Q4-3 ***** Genre accuracy metal 32.72% blues 23.71% country 63.47% hiphop 23.70% rock 51.02% reggae 50.41% disco 44.90% jazz 33.80% pop 60.00% ---------- Overall accuracy: 43.21% gamma 1000 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:31<00:00, 10.90it/s] ***** Q4-3 ***** Genre accuracy metal 32.15% blues 23.34% country 62.45% hiphop 23.61% rock 50.20% reggae 48.81% disco 43.95% jazz 32.82% pop 59.63% ---------- Overall accuracy: 42.45% ``` ![](https://i.imgur.com/dDkANl3.png) ### BoNus - Discussions: - 由結果可見，acc_STFT>acc_CQT>acc_CENS，準確率最高的是STFT，其物理含義也十分明确：就是把時間信號轉換為時間-頻率的信號，根據FFT的窗長和選擇的窗函數來決定時間-頻率分辨率的tradeoff每一段時間內的頻率成分。 - Constant Q Transform (CQT) 使用FFT進行頻率分析中的頻率是線性的，但人耳感知頻率卻不是線性的，也就是說人耳對某些頻段要比其他頻段敏感：對頻率的感知敏感度大體符合一個log normal distribution；對比較低頻率的敏感度(或者區分度)比較高，越高頻率敏感率越低，CQT就是按照這個原理在FFT的基礎上進行對數壓縮，使結果更逼近人耳的感知。 - CENS特征(chroma CENS)是在該變體中，對constant-Q chromagram執行額外的後處理步驟，以獲得動態和音色不變的特征。CENS特征的主要思想是在大的窗口上進行統計，平滑在節奏、發音和音樂裝飾物(如顫音和琶音和弦)上的局部偏差。因此，CENS特性對于音頻匹配和檢索等應用程序非常有用。 - Q4-1-1-cqt ![](https://i.imgur.com/Fm0awBC.png) - Q4-1-1-cens ![](https://i.imgur.com/EqV9Vux.png) - Q4-1-2-cqt ![](https://i.imgur.com/8z3wlHc.png) - Q4-1-2-cens ![](https://i.imgur.com/XGPfVqm.png) - Q4-2-1-cqt ![](https://i.imgur.com/R7hInBg.png) ![](https://i.imgur.com/UNC0U1m.png) - Q4-2-1-cens ![](https://i.imgur.com/HpNaz4b.png) ![](https://i.imgur.com/oiaXBBI.png) - Q4-2-2-cqt ![](https://i.imgur.com/nZEvulU.png) - Q4-2-2-cens ![](https://i.imgur.com/U60KIhU.png) - Q4-3-1-cqt ![](https://i.imgur.com/4Hohs91.png) - Q4-3-1-cens ![](https://i.imgur.com/LGQf2NH.png) - Q4-3-2-cqt ![](https://i.imgur.com/LJgN2F5.png) - Q4-3-2-cens ![](https://i.imgur.com/da6kdNH.png) ## Question 5 - Tasks - Detecte the local keys over the time stamps of a music piece instead of finding a global key of a music piece, ouput the local key every second. - Work on two datasets: 1. [BFS-FH](https://drive.google.com/drive/folders/1gEV87HsdM_4K1EuaOEDjs2yL5-G37UvJ?usp=drive_open) 2. [A-Maps](https://drive.google.com/drive/folders/1IKMUAqsLTy8sBmbCaaDiK9yWowlHLI8z) - Discussions: - 使用KS-profile在加上average filtering（windows size〜= 64s），最終達到了60％ overall accuracy。 ```shell 3.wav 35.761% 57.714% 6.wav 29.332% 49.047% 11.wav 65.760% 72.298% 12.wav 26.140% 55.684% 13.wav 36.923% 48.462% 14.wav 66.667% 73.152% 16.wav 27.011% 42.874% 18.wav 75.710% 79.794% 19.wav 67.483% 74.580% 21.wav 47.120% 55.909% 22.wav 66.953% 74.464% 24.wav 56.242% 62.257% 25.wav 43.088% 61.176% 28.wav 55.556% 65.131% 31.wav 65.896% 69.595% 32.wav 62.679% 71.615% ***** Q5-BPS-FH ***** OverallAcc: acc1: 50.148% acc2: 62.230% ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.