AI chord recognition
===
###### tags: `new ML`
## Directory and how it works
https://hackmd.io/_nqLUMkSSrKPhaE0bz3mvQ
## download dataset
1. know how to youtube download
參考(https://hackmd.io/XxCRtjdCRpequXQ5A1xiiw)
2. read json (youtube list)
3. download from youtube url list
4. add labels
5. dataset: https://hackmd.io/7EIIhKsiR1m2diiRx7ifQw?view
## load json file as dictionary
```python =
import json
with open('/home/hsiny/01_ML_project/MIR-CE500_20210422/MIR-CE500_corrected.json') as json_file:
labels = json.load(json_file)
with open('/home/hsiny/01_ML_project/MIR-CE500_20210422/MIR-CE500_link.json') as json_file:
data = json.load(json_file)
```
## END SOLUTION(load video)
this code gets all the downloads while skipping the private videos
```python=
import json
from pathlib import Path
from pytube import YouTube
import os
from pytube import YouTube, extract, request
# with open('/home/hsiny/01_MLproject/MIR-CE500_20210422/MIR-CE500_corrected.json') as json_file:
# labels = json.load(json_file)
list = []
nopass = ["This is a private video. Please sign in to verify that you may see it.","Video unavailable","Sign in to confirm your age",'This live stream recording is not available.','Join this channel to get access to members-only content ']
def get_youtube_songs(json_file_path,target_PATH):
with open(json_file_path) as json_file:
data = json.load(json_file)
# each has 500
for i in data:
url = data[i]
status, messages = extract.playability_status(request.get(url=url))
availibale = 0
for j in nopass:
if(j == messages[0]):
availibale = 1
if(availibale == 0):
yt = YouTube(url=url)
video = yt.streams.filter(only_audio=True).first()
out_file = video.download(output_path=target_PATH)
base, ext = os.path.splitext(out_file)
new_file = target_PATH+ "/"+ i + '.mp3'
os.rename(out_file, new_file)
print("target part =" + (new_file))
else:
list.append(i)
print(f'this is the list that were not successfully downloaded {list}')
PATH = "./MIR-CE500_link.json"
target_path = "./train"#this will make a train folder for you
get_youtube_songs(PATH, target_path)
```
## the list of videos that couldnt be downloaded
['129', '162', '163', '165', '167', '189', '261', '283', '297', '301', '302', '303', '304', '316', '319', '352', '368', '499']
# Notes from 2022/Nov/13
librosa
data pytorch make it into a dataset
python spot
glob to folder
learn rnn to data
padding to solve lentgh problem
maybe don have to one row one song
maybe we cant determine the genre first to increase t
he accuarcy
## how to use python glob
website:https://ithelp.ithome.com.tw/articles/10262521
```
import os
import glob
# Create directory
os.mkdir(os.path.join("Folder_1")
os.mkdir(os.path.join("Folder_1", "File_1.txt"))
os.mkdir(os.path.join("Folder_1", "File_2.csv"))
os.mkdir(os.path.join("Folder_1", "File_3.txt"))
```
```
# Find pathnames under the specified directory
# 取得 Folder_1 這層裡面,所有東西的路徑
print(glob.glob(os.path.join("Folder_1", "*")))
# 取得 Folder_1 這層裡面,結尾是 .txt 的路徑
print(glob.glob(os.path.join("Folder_1", "*.txt")))
# 取得 Folder_1 這層裡面,結尾是 .csv 的路徑
print(glob.glob(os.path.join("Folder_1", "*.csv")))
# 取得 Folder_1 這層裡面,檔名中有 1 或 2 的東西的路徑
print(glob.glob(os.path.join("Folder_1", "*[1-2]*")))
# 備註
# 這邊利用 os.path.join 來連接指定字串形成路徑,因為不同系統下的分隔符號可能不同
# 可以用 os.sep 來查看,在設定路徑時,也直接用符合的分隔符號來串連
```
# Making a dataset
## with padding(let every song have the same amount of sample)
## without padding (a linear dataset with name and time)
# NEW 2022/11/20 update
github:https://github.com/railohail/AI_chord_recognition
if a song data has a length of 10 sec cut it into 5 sec 5sec
once built put song into the variable (into the ram) so it can be faster
clean the song more better
try to use json to write your dataset
paddding to make each batch the same
# BPM
maybe we use bpm to determine the better batch
https://librosa.org/doc/main/generated/librosa.beat.tempo.html
this is for padding
https://librosa.org/doc/main/generated/librosa.util.fix_length.html
# another way to do it
maybe i can take all the range and calculate the length to shorten it
# new dataset nov 30
websites that i saw for linked list python
https://lovedrinkcafe.com/python-single-linked-list/
# librosa setup
```pyty=
import librosa
import torch
y, sr= librosa.load('./train/1.mp3',offset=1,duration=0.5) # sr is sample rate
print(f'the sample rate is {sr}')
print(y)
print(type(y))
x = torch.from_numpy(y)
print(x)
print(type(x))
```
use of the torch.from_numpy(y)
it will make a tensor from a numpy array
# chord recognition
https://www.audiolabs-erlangen.de/resources/MIR/FMP/C5/C5S2_ChordRec_Templates.html
how librosa handles chord recognition