Kaggle DSP - HackMD

## Env for kaldi ```bash git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream ``` `kaldi/tools/INSTALL` ```bash To check the prerequisites for Kaldi, first run extras/check_dependencies.sh and see if there are any system-level installations you need to do. Check the output carefully. There are some things that will make your life a lot easier if you fix them at this stage. If your system default C++ compiler is not supported, you can do the check with another compiler by setting the CXX environment variable, e.g. CXX=g++-4.8 extras/check_dependencies.sh Then run make which by default will install ATLAS headers, OpenFst, SCTK and sph2pipe. OpenFst requires a relatively recent C++ compiler with C++11 support, e.g. g++ >= 4.7, Apple clang >= 5.0 or LLVM clang >= 3.3. If your system default compiler does not have adequate support for C++11, you can specify a C++11 compliant compiler as a command argument, e.g. make CXX=g++-4.8 If you have multiple CPUs and want to speed things up, you can do a parallel build by supplying the "-j" option to make, e.g. to use 4 CPUs make -j 4 In extras/, there are also various scripts to install extra bits and pieces that are used by individual example scripts. If an example script needs you to run one of those scripts, it will tell you what to do. ``` `kaldi/src/INSTALL` ```bash These instructions are valid for UNIX-like systems (these steps have been run on various Linux distributions; Darwin; Cygwin). For native Windows compilation, see ../windows/INSTALL. You must first have completed the installation steps in ../tools/INSTALL (compiling OpenFst; getting ATLAS and CLAPACK headers). The installation instructions are ./configure --shared make depend -j 8 make -j 8 Note that we added the "-j 8" to run in parallel because "make" takes a long time. 8 jobs might be too many for a laptop or small desktop machine with not many cores. For more information, see documentation at http://kaldi-asr.org/doc/ and click on "The build process (how Kaldi is compiled)". ``` ## Pre-processing :::info 注意先用sox轉音檔格式，轉成 16 kHz sampling, signed-integer, 16 bits ::: 如果您想将新的文件保存到名为`new_train`和`new_test`的文件夹中，可以使用以下命令： ```bash mkdir new_train mkdir new_test for file in train/*.wav; do new_file="new_train/$(basename "$file")" sox "$file" -r 16000 -e signed-integer -b 16 "$new_file" done for file in test/*.wav; do new_file="new_test/$(basename "$file")" sox "$file" -r 16000 -e signed-integer -b 16 "$new_file" done ``` 这些命令首先创建了`new_train`和`new_test`文件夹，然后将转换后的音频文件保存到这两个新文件夹中。文件结构和原始文件夹结构相同，但格式已更改。 ## Build some files ![](https://hackmd.io/_uploads/r1RseKFb6.png) ## `csv_to_txt.py` ```python import csv with open('text.txt', 'w') as outfile: with open('train-toneless.csv', 'r', errors='ignore') as infile: [outfile.write(" ".join(row) + "\n") for row in csv.reader(infile)] outfile.close() ``` ## `txt_to_csv.py` ```python import csv # Read the text file with open('your_text_file.txt', 'r') as file: lines = file.readlines() # Parse the text data and convert it into a list of dictionaries data = [] for line in lines: parts = line.strip().split() id, *words = parts data.append({'id': id, 'text': ' '.join(words)}) # Sort the data by the 'id' field sorted_data = sorted(data, key=lambda x: int(x['id'])) # Write the sorted data to a CSV file with open('output.csv', 'w', newline='') as csvfile: fieldnames = ['id', 'text'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for row in sorted_data: writer.writerow(row) print("Data has been transferred to output.csv and sorted by ID.") ``` ### GPU version 記得 GPU job 都是 1 --- ## Build ESPnet [助教筆記](https://aged-foe-6b2.notion.site/ESPnet-egs2-b40c366ac70646628c4ecf54633780ac) ### 跑yesno `espnet/eg2/yesno/asr1` 跑 `bash run.sh` ### `data` **`spk2utt`** ``` S0002 BAC009S0002W0122 BAC009S0002W0123 BAC009S0002W0124 BAC009S0002W0125 BAC009S0002W0126 BAC009S0002W0127 BAC009S0002W0128 BAC009S0002W0129 ``` **`text`** **`utt2spk`** ``` BAC009S0002W0122 S0002 ``` **`wav.scp`** ``` BAC009S0002W0122 /home/mllab/Desktop/espnet/egs2/aishell/asr1/downloads/data_aishell/wav/train/S0002/BAC009S0002W0122.wav ``` --- ## HW5 ```bash= AttributeError: 'HParams' object has no attribute 'duration_discriminator_type' ``` [Github Issue](https://github.com/p0p4k/vits2_pytorch/issues/64) Add `"duration_discriminator_type" :"dur_disc_2",` in the `model` from `configs/vits2_ljs_nosdp.json` `configs/vits2_ljs_nosdp.json` 可以調低 batch_size，原先 64