HackMD - Collaborative Markdown Knowledge Base

ESPnet tutorial 0. **Preparation** ``` $ ssh <user name>@login.clsp.jhu.edu $ ssh bXX $ mkdir -p /export/<aYY or bYY>/<user name>/<your working directory> $ cd /export/<aYY or bYY>/<user name>/<your working directory> ``` - login to some computer node bXX (do not do the experiment at the login nodes) - move to the experimental directory on aYY or bYY (do not do the experiment at your home directory) 1. **download ESPnet** ``` $ git clone https://github.com/espnet/espnet.git ``` 2. **set the environment** - To use cuda (and cudnn), make sure to set paths in your `.bashrc` or `.bash_profile` appropriately. ``` CUDAROOT=/path/to/cuda export PATH=$CUDAROOT/bin:$PATH export LD_LIBRARY_PATH=$CUDAROOT/lib64:$LD_LIBRARY_PATH export CUDA_HOME=$CUDAROOT export CUDA_PATH=$CUDAROOT ``` - (Optional) if you want to use multiple GPUs, you should install [nccl](https://developer.nvidia.com/nccl) and set paths in your `.bashrc` or `.bash_profile` appropriately, for example: ``` CUDAROOT=/path/to/cuda NCCL_ROOT=/path/to/nccl export CPATH=$NCCL_ROOT/include:$CPATH export LD_LIBRARY_PATH=$NCCL_ROOT/lib/:$CUDAROOT/lib64:$LD_LIBRARY_PATH export LIBRARY_PATH=$NCCL_ROOT/lib/:$LIBRARY_PATH export CUDA_HOME=$CUDAROOT export CUDA_PATH=$CUDAROOT ``` - **Easiest way is to use my setup** ``` CUDAROOT=/home/shinji/tools/cuda NCCL_ROOT=/home/shinji/tools/ncll/nccl_2.1.15-1+cuda8.0_x86_64/ export CPATH=$NCCL_ROOT/include:$CPATH export LD_LIBRARY_PATH=$NCCL_ROOT/lib/:$CUDAROOT/lib64:$LD_LIBRARY_PATH export LIBRARY_PATH=$NCCL_ROOT/lib/:$LIBRARY_PATH export CUDA_HOME=$CUDAROOT export CUDA_PATH=$CUDAROOT ``` - [x] **checkpoint 1)**: check whether CUDA (and NCCL) paths are correctly set by ``` $ echo $CUDA_PATH /path/to/cuda $ echo $CUDA_HOME /path/to/cuda $ echo $LD_LIBRARY_PATH /path/to/nccl/lib/:/path/to/cuda/lib64:/path/to/nccl/lib/:/path/to/cuda/lib64: $ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85 ``` 3. **Installation** - move to the `espnet/tools` directory, and make by specifying your Kaldi directory ``` $ cd espnet/tools $ make KALDI=/path/to/kaldi ``` - **Easiest way is to use compiled one** ``` cp -r /export/a08/shinji/201707e2e/espnet_tutorial espnet ``` - [x] **checkpoint 2)**: check whether pytorch, chainer, and warpctc are correctly installed ``` $ cd espnet/tools $ . venv/bin/activate $ python Python 2.7.13 (default, Nov 24 2017, 17:33:09) [GCC 6.3.0 20170516] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> torch.cuda.is_available() True >>> import warpctc_pytorch >>> import chainer >>> exit() $ deactivate ``` 4. **Run the CMU Census Database (AN4) recipe** - move to the recipe directory ``` cd espnet/egs/an4/asr1 ``` - change `cmd.sh` to use the CLSP gridengine ``` # JHU setup #export train_cmd="queue.pl --mem 2G" #export cuda_cmd="queue.pl --mem 2G --gpu 1 --config conf/gpu.conf" #export decode_cmd="queue.pl --mem 4G" ``` -> ``` # JHU setup export train_cmd="queue.pl --mem 2G" export cuda_cmd="queue.pl --mem 2G --gpu 1 --config conf/gpu.conf" export decode_cmd="queue.pl --mem 4G" ``` - [x] **checkpoint 3)**: check whether `cmd.sh` is correctly set ``` $ . ./cmd.sh $ echo $train_cmd queue.pl --mem 2G $ echo $cuda_cmd queue.pl --mem 2G --gpu 1 --config conf/gpu.conf ``` - [x] **checkpoint 4)**: check whether Kaldi is correctly set ``` $ . ./path.sh (venv) $ copy-feats copy-feats Copy features [and possibly change format] Usage: copy-feats [options] <feature-rspecifier> <feature-wspecifier> or: copy-feats [options] <feats-rxfilename> <feats-wxfilename> : : ``` - **after you complete checkpoint 1-4) successfully, then** execute the main experiment script. ``` ./run.sh ``` 5. **Explanation about each stage** - stage -1: data download - The downloaded data is stored in `downloads` ``` if [ ${stage} -le -1 ]; then echo "stage -1: Data Download" mkdir -p ${datadir} local/download_and_untar.sh ${datadir} ${data_url} fi ``` - stage 0: data preparation ``` echo "stage 0: Data preparation" mkdir -p data/{train,test} exp if [ ! -f ${an4_root}/README ]; then echo Cannot find an4 root! Exiting... exit 1 fi python local/data_prep.py ${an4_root} ${KALDI_ROOT}/tools/sph2pipe_v2.5/sph2pipe for x in test train; do for f in text wav.scp utt2spk; do sort data/${x}/${f} -o data/${x}/${f} done utils/utt2spk_to_spk2utt.pl data/${x}/utt2spk > data/${x}/spk2utt done ``` - The `data` structure is exactly same as the one used in Kaldi, e.g., ``` $ ls data/train_nodev/ cmvn.ark feats.scp spk2utt text utt2spk wav.scp ``` - stage 1: feature extraction ``` if [ ${stage} -le 1 ]; then ### Task dependent. You have to design training and dev sets by yourself. ### But you can utilize Kaldi recipes in most cases echo "stage 1: Feature Generation" fbankdir=fbank # Generate the fbank features; by default 80-dimensional fbanks with pitch on each frame for x in test train; do steps/make_fbank_pitch.sh --cmd "$train_cmd" --nj 8 data/${x} exp/make_fbank/${x} ${fbankdir} done # make a dev set utils/subset_data_dir.sh --first data/train 100 data/${train_dev} n=$[`cat data/train/text | wc -l` - 100] utils/subset_data_dir.sh --last data/train ${n} data/${train_set} # compute global CMVN compute-cmvn-stats scp:data/${train_set}/feats.scp data/${train_set}/cmvn.ark # dump features dump.sh --cmd "$train_cmd" --nj 8 --do_delta $do_delta \ data/${train_set}/feats.scp data/${train_set}/cmvn.ark exp/dump_feats/train ${feat_tr_dir} dump.sh --cmd "$train_cmd" --nj 8 --do_delta $do_delta \ data/${train_dev}/feats.scp data/${train_set}/cmvn.ark exp/dump_feats/dev ${feat_dt_dir} for rtask in ${recog_set}; do feat_recog_dir=${dumpdir}/${rtask}/delta${do_delta}; mkdir -p ${feat_recog_dir} dump.sh --cmd "$train_cmd" --nj 8 --do_delta $do_delta \ data/${rtask}/feats.scp data/${train_set}/cmvn.ark exp/dump_feats/recog/${rtask} \ ${feat_recog_dir} done fi ``` - use Kaldi feature extraction - stage 2: dictionary and json data preparation ``` if [ ${stage} -le 2 ]; then ### Task dependent. You have to check non-linguistic symbols used in the corpus. echo "stage 2: Dictionary and Json Data Preparation" mkdir -p data/lang_1char/ echo "<unk> 1" > ${dict} # <unk> must be 1, 0 will be used for "blank" in CTC text2token.py -s 1 -n 1 data/${train_set}/text | cut -f 2- -d" " | tr " " "\n" \ | sort | uniq | grep -v -e '^\s*$' | awk '{print $0 " " NR+1}' >> ${dict} wc -l ${dict} # make json labels data2json.sh --feat ${feat_tr_dir}/feats.scp \ data/${train_set} ${dict} > ${feat_tr_dir}/data.json data2json.sh --feat ${feat_dt_dir}/feats.scp \ data/${train_dev} ${dict} > ${feat_dt_dir}/data.json fi ``` - create graphme dictionary - [x] **checkpoint 5)**: check the dictionary, which are composed of the alphabet and special symbols (space and unk symbols) ``` $ cat data/lang_1char/train_nodev_units.txt <unk> 1 <space> 2 A 3 B 4 C 5 D 6 E 7 F 8 G 9 H 10 I 11 J 12 K 13 L 14 M 15 N 16 O 17 P 18 Q 19 R 20 S 21 T 22 U 23 V 24 W 25 X 26 Y 27 Z 28 ``` - create json files, which contain all annotation information (transcriptions, feature paths, speaker IDs, etc.) - [x] **checkpoint 6)**: check the json file ``` $ less dump/train_nodev/deltafalse/data.json { "utts": { "mtxj-an377-b": { # utterance ID "utt2spk": "mtxj", # speaker ID "input": [ { "shape": [ 368, 83 ], "feat": "/export/a08/shinji/201707e2e/espnet_tutorial/egs/an4/asr1/dump/train_nodev/deltafalse/feats.8.ark:1923825", "name": "input1" } ], "output": [ { "text": "RUBOUT J B X R Z NINE TWENTY", "shape": [ 28, 30 ], "name": "target1", "token": "R U B O U T <space> J <space> B <space> X <space> R <space> Z <space> N I N E <space> T W E N T Y", "tokenid": "20 23 4 17 23 22 2 12 2 4 2 26 2 20 2 28 2 16 11 16 7 2 22 25 7 16 22 27" } ] }, ``` - stage 3: network training ``` if [ ${stage} -le 3 ]; then echo "stage 3: Network Training" ${cuda_cmd} --gpu ${ngpu} ${expdir}/train.log \ asr_train.py \ --ngpu ${ngpu} \ --backend ${backend} \ --outdir ${expdir}/results \ --debugmode ${debugmode} \ --dict ${dict} \ --debugdir ${expdir} \ --minibatches ${N} \ --verbose ${verbose} \ --resume ${resume} \ --train-json ${feat_tr_dir}/data.json \ --valid-json ${feat_dt_dir}/data.json \ --etype ${etype} \ --elayers ${elayers} \ --eunits ${eunits} \ --eprojs ${eprojs} \ --subsample ${subsample} \ --dlayers ${dlayers} \ --dunits ${dunits} \ --atype ${atype} \ --aconv-chans ${aconv_chans} \ --aconv-filts ${aconv_filts} \ --mtlalpha ${mtlalpha} \ --batch-size ${batchsize} \ --maxlen-in ${maxlen_in} \ --maxlen-out ${maxlen_out} \ --opt ${opt} \ --epochs ${epochs} fi ``` - main training of end-to-end ASR - several important options - `--backend ${backend}`: DNN backend, chainer or pytorch - `--elayers ${elayers}`: number of encoder BLSTM layers - `--eunits ${eunits}`: number of encoder BLSTM units - `--eprojs ${eprojs}`: number of linear layer outputs after every BLSTM layer - `--subsample ${subsample}`: subsampling of the encoder BLSTM layers (1_2_2_1_1: we downsample the input of 2nd and 3rd layers with a factor of 2) - `--dlayers ${dlayers}`: number of decoder LSTM layers - `--dunits ${dunits}`: number of decoder LSTM units - `--atype ${atype}`: attention type (location) - `--mtlalpha`: tune the CTC weight - `--batch-size ${batchsize}`: batch size - `--opt ${opt}`: optimizer type - [x] **checkpoint 7)**: monitor training. The training takes approximately 15min. per epochs. ``` $ tail -f exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/train.log ``` - initial training: ``` 2018-06-15 09:32:34,381 (e2e_asr_attctc_th:467) INFO: CTC input lengths: Variable containing: 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62[torch.IntTensor of size 30] 2018-06-15 09:32:34,381 (e2e_asr_attctc_th:468) INFO: CTC output lengths: Variable containing: 25 11 9 12 23 13 35 20 13 9 14 39 21 29 9 40 20 7 11 5 11 9 15 11 17 24 32 31 32 29[torch.IntTensor of size 30] 2018-06-15 09:32:34,409 (e2e_asr_attctc_th:474) INFO: ctc loss:163.163162231 2018-06-15 09:32:34,418 (e2e_asr_attctc_th:1674) INFO: Decoder input lengths: [62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62] 2018-06-15 09:32:34,418 (e2e_asr_attctc_th:1675) INFO: Decoder output lengths: [26, 12, 10, 13, 24, 14, 36, 21, 14, 10, 15, 40, 22, 30, 10, 41, 21, 8, 12, 6, 12, 10, 16, 12, 18, 25, 33, 32, 33, 30] 2018-06-15 09:32:34,608 (e2e_asr_attctc_th:1709) INFO: att loss: 66.6240[torch.cuda.FloatTensor of size 1 (GPU 0)] 2018-06-15 09:32:34,609 (e2e_asr_attctc_th:1724) INFO: groundtruth[0]: RUBOUT<space>N<space>Z<space>X<space>L<space>THIRTY<space>ONE<eos> 2018-06-15 09:32:34,609 (e2e_asr_attctc_th:1725) INFO: prediction [0]: XNRNRR<eos>JJJJJJJJJJ<eos>JJNJJJJQ 2018-06-15 09:32:34,609 (e2e_asr_attctc_th:1724) INFO: groundtruth[1]: N<space>E<space>W<space>E<space>L<space>L<eos> 2018-06-15 09:32:34,609 (e2e_asr_attctc_th:1725) INFO: prediction [1]: XMJ<unk>M<blank><blank>B<blank>J<blank>J 2018-06-15 09:32:34,609 (e2e_asr_attctc_th:1724) INFO: groundtruth[2]: C<space>H<space>R<space>I<space>S<eos> 2018-06-15 09:32:34,610 (e2e_asr_attctc_th:1725) INFO: prediction [2]: XOMMMNMJMJ 2018-06-15 09:32:34,610 (e2e_asr_attctc_th:1724) INFO: groundtruth[3]: A<space>A<space>I<space>L<space>ZERO<eos> 2018-06-15 09:32:34,610 (e2e_asr_attctc_th:1725) INFO: prediction [3]: XKJJJJMJJJUNN 2018-06-15 09:32:34,610 (e2e_asr_attctc_th:1724) INFO: groundtruth[4]: ENTER<space>SEVEN<space>TWO<space>ONE<space>SIX<eos> 2018-06-15 09:32:34,610 (e2e_asr_attctc_th:1725) INFO: prediction [4]: XXX<eos>XNQPXXXQJ<eos><eos>Q<blank>BQMMMQJ 2018-06-15 09:32:34,611 (e2e_asr_attctc_th:96) INFO: mtl loss:114.893569946 2018-06-15 09:32:34,759 (asr_pytorch:129) INFO: grad norm=46.892091356 ``` - final training: ``` 2018-06-15 09:35:16,868 (e2e_asr_attctc_th:467) INFO: CTC input lengths: Variable containing: 47 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 42 42 42 42 42[torch.IntTensor of size 30] 2018-06-15 09:35:16,868 (e2e_asr_attctc_th:468) INFO: CTC output lengths: Variable containing: 24 21 10 21 20 20 22 9 7 22 22 17 22 22 22 22 20 11 22 22 16 13 15 10 22 15 18 22 22 19[torch.IntTensor of size 30] 2018-06-15 09:35:16,870 (e2e_asr_attctc_th:474) INFO: ctc loss:26.512140274 2018-06-15 09:35:16,878 (e2e_asr_attctc_th:1674) INFO: Decoder input lengths: [47, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 42, 42, 42, 42, 42] 2018-06-15 09:35:16,878 (e2e_asr_attctc_th:1675) INFO: Decoder output lengths: [25, 22, 11, 22, 21, 21, 23, 10, 8, 23, 23, 18, 23, 23, 23, 23, 21, 12, 23, 23, 17, 14, 16, 11, 23, 16, 19, 23, 23, 20] 2018-06-15 09:35:16,909 (e2e_asr_attctc_th:1709) INFO: att loss: 13.7800[torch.cuda.FloatTensor of size 1 (GPU 0)] 2018-06-15 09:35:16,910 (e2e_asr_attctc_th:1724) INFO: groundtruth[0]: ENTER<space>SEVEN<space>THIRTY<space>EIGHT<eos> 2018-06-15 09:35:16,910 (e2e_asr_attctc_th:1725) INFO: prediction [0]: ENTER<space>SEVEN<space>TWRRTY<space>SIGHT<space> 2018-06-15 09:35:16,910 (e2e_asr_attctc_th:1724) INFO: groundtruth[1]: THREE<space>FOUR<space>EIGHT<space>ZERO<eos> 2018-06-15 09:35:16,910 (e2e_asr_attctc_th:1725) INFO: prediction [1]: EWREE<space>TOUR<space>TIGHT<space>TERO<eos> 2018-06-15 09:35:16,910 (e2e_asr_attctc_th:1724) INFO: groundtruth[2]: ENTER<space>NINE<eos> 2018-06-15 09:35:16,910 (e2e_asr_attctc_th:1725) INFO: prediction [2]: ONTER<space>OINE<space> 2018-06-15 09:35:16,911 (e2e_asr_attctc_th:1724) INFO: groundtruth[3]: FIFTY<space>THREE<space>FORTY<space>TWO<eos> 2018-06-15 09:35:16,911 (e2e_asr_attctc_th:1725) INFO: prediction [3]: EOVTY<space>TWREE<space>TOUTY<space>SWO<space> 2018-06-15 09:35:16,911 (e2e_asr_attctc_th:1724) INFO: groundtruth[4]: ONE<space>FIVE<space>TWO<space>TWO<space>ONE<eos> 2018-06-15 09:35:16,911 (e2e_asr_attctc_th:1725) INFO: prediction [4]: ONE<space>FIVE<space>TWO<space>OWO<space>ONE<space> 2018-06-15 09:35:16,912 (e2e_asr_attctc_th:96) INFO: mtl loss:20.146074295 2018-06-15 09:35:17,008 (asr_pytorch:129) INFO: grad norm=28.6609776055 ``` - you can see that the prediction result is getting better - approximately 15min. per epochs. (20 epochs X 15min. = 5 hours. It will not be finished during the lab) - [ ] **checkpoint 8)**: check costs and accuracies per epoch ``` exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/results/log ``` and/or ``` exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/results/{acc,loss}.png ``` - [ ] **checkpoint 9)**: check attention weights (**but unfortunately AN4 data is too small to fully train the attention model, and we cannnot find clear attention patterns from this data**) ``` exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/results/att_ws/*.png ``` - stage 4: decoding ``` if [ ${stage} -le 4 ]; then echo "stage 4: Decoding" nj=8 for rtask in ${recog_set}; do ( decode_dir=decode_${rtask}_beam${beam_size}_e${recog_model}_p${penalty}_len${minlenratio}-${maxlenratio}_ctcw${ctc_weight} feat_recog_dir=${dumpdir}/${rtask}/delta${do_delta} # split data data=data/${rtask} split_data.sh --per-utt ${data} ${nj}; sdata=${data}/split${nj}utt; # make json labels for recognition for j in `seq 1 ${nj}`; do data2json.sh --feat ${feat_recog_dir}/feats.scp \ ${sdata}/${j} ${dict} > ${sdata}/${j}/data.json done #### use CPU for decoding ngpu=0 ${decode_cmd} JOB=1:${nj} ${expdir}/${decode_dir}/log/decode.JOB.log \ asr_recog.py \ --ngpu ${ngpu} \ --backend ${backend} \ --debugmode ${debugmode} \ --verbose ${verbose} \ --recog-json ${sdata}/JOB/data.json \ --result-label ${expdir}/${decode_dir}/data.JOB.json \ --model ${expdir}/results/model.${recog_model} \ --model-conf ${expdir}/results/model.conf \ --beam-size ${beam_size} \ --penalty ${penalty} \ --maxlenratio ${maxlenratio} \ --minlenratio ${minlenratio} \ --ctc-weight ${ctc_weight} \ & wait score_sclite.sh ${expdir}/${decode_dir} ${dict} ) & done wait echo "Finished" fi ``` - decoding is perfomed with multiple (8) CPUs for development (`test_dev`) and evaluation (`test`) test sets. - several important options - `--beam-size ${beam_size}`: beam size - `--ctc-weight ${ctc_weight}`: CTC score weight during beam search - [ ] **checkpoint 10)**: check final results. This would be appeared in the terminal when we finish all experiments, e.g., ``` 2018-06-20 12:45:07,057 (concatjson:28) INFO: new json has 100 utterances 2018-06-20 12:45:07,425 (json2trn:22) INFO: reading exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_train_dev_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5/data.json 2018-06-20 12:45:07,427 (json2trn:26) INFO: reading data/lang_1char/train_nodev_units.txt 2018-06-20 12:45:07,428 (json2trn:34) INFO: writing hyp trn to exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_train_dev_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5/hyp.trn 2018-06-20 12:45:07,428 (json2trn:35) INFO: writing ref trn to exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_train_dev_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5/ref.trn write a CER (or TER) result in exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_train_dev_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 100 1915 | 77.1 7.5 15.4 0.4 23.3 79.0 | 2018-06-20 12:45:12,744 (concatjson:28) INFO: new json has 130 utterances 2018-06-20 12:45:13,547 (json2trn:22) INFO: reading exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_test_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5/data.json 2018-06-20 12:45:13,554 (json2trn:26) INFO: reading data/lang_1char/train_nodev_units.txt 2018-06-20 12:45:13,555 (json2trn:34) INFO: writing hyp trn to exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_test_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5/hyp.trn 2018-06-20 12:45:13,555 (json2trn:35) INFO: writing ref trn to exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_test_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5/ref.trn write a CER (or TER) result in exp/train_nodev_blstmp_e4_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_test_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 130 2565 | 84.0 6.0 10.0 1.1 17.0 68.5 | ``` 6. **Modification of configurations** - try to optimize some configurations by tuning the following hyperparameters (do not have to try everything) - use other features - modify Kaldi feature extraction at stage 1 - run an experiment from stage 1 ``` ./run.sh --stage 1 ``` - change model topologies - Number of encoder layers ``` ./run.sh --stage 3 --elayers XXX ``` - Number of encoder units ``` ./run.sh --stage 3 --eunits XXX ``` - Number of decoder layers ``` ./run.sh --stage 3 --dlayers XXX ``` - Number of decoder units ``` ./run.sh --stage 3 --dunits XXX ``` - change attention type (choices=['noatt', 'dot', 'add', 'location', 'coverage', 'coverage_location', 'location2d', 'location_recurrent', 'multi_head_dot', 'multi_head_add', 'multi_head_loc', 'multi_head_multi_res_loc']. Probably some of them would not be working correctly.) ``` ./run.sh --stage 3 --atype XXX ``` - optmization (Adadelta -> Adam) ``` ./run.sh --stage 3 --opt adam ``` - tune the CTC-attention weights ``` # change CTC attention weights $ ./run.sh --stage 3 --mtlalpha XXX --ctc_weight XXX # CTC mode $ ./run.sh --stage 3 --mtlalpha 1.0 --ctc_weight 1.0 --recog_model loss.best # attention mode $ ./run.sh --stage 3 --mtlalpha 0.0 --ctc_weight 0.0 ``` - [ ] **checkpoint 11)**: report your best number (CER) for the test set to me <shinjiw@ieee.org> 7. **Run the AN4 recipe with GPUs (optional)** - ESPnet is designed to be used with the GPU. If you have enough knowledge to use GPUs in the CLSP cluster, **ask instructors in advance**, then perform GPU experiments. ``` ./run.sh --stage 3 --ngpu 1 ```