DNN-HMM

DNN model - nnet3

nnet3的xent是啥?查append

ricerDnn nnet3 xent 是指 cross entropy
sing5hong5xconfig的文件要下什麼關鍵字去找,找不太到
ricer看 fianl.config 裡面有建立。我想 kaldi 應該是參考 tensorflow 的
sing5hong5好,我參考一下tensorflow和找final.config
ricer可以看 src/nnet3/simple-component.cc,裡面會介紹 每個component
Lstm 或 比較複雜的在Nnet3-component.cc

P18 參數

ricerDNN去看冊、google「十大問題」,看基礎
一般攏是tdnn
fbank/mfcc 40維,前後5frame,攏總11frame
所以dnn的input有40x11
中央4層256抑是512的hidden layer
尾仔是3000个pdf state

若是愛抗噪,會當做256CNN+TDNN2層
lstm傷複雜,做久有經驗就會曉矣,穩定了才閣調,這馬無建議

鴻欣
high resolution : mfcc high freq
ivector
wsj/local/nnet3
tuning/tdnn 1a.sh
hires : high resolution

鴻欣2018/09/06 印度kaldi發表
swbd/local/chain/tunning/run_tdd_7q.sh
做SVD,矩陣分解
1000x1000=> 1000 x100 , 100 x 1000
參數變少,層數更深(7->14)

use-gpu=wait when using run.pl

--use-gpu=wait,就免因為GPU無夠濟,調job的數量,會自動等別的job做了

Train a monophone DNN

--context-opts "--context-width=1 --central-position=0"

Maximize GPU memory usage?

kaldikah別人比起來,GPU記憶體用誠省,因為in的硬體無啥記憶體

Is there something like Stack Operation in nnet3 descriptor?

佇nnet3內底,攏是vector,袂是maxtix

training end to end using CTC on custom data

end2end CTC的效果攏無好,所以kaldi無支援

CMVN on librispeech corpus using Kaldi

nnet愛用dynamic range,愛先另外加一層batchnorm layer

Some questions about nnet3 discriminative training

解釋nnet3的senones維度、layout佮minibatch實際按怎操作。有講著frame rejection 的論文

nn training epochs, overfitting, and visualizing training process
lre07 v2 train_dnn doubt
  • 講著BNF佮LID,kaldi的程式有出現,毋過猶未查in是啥
Decoding using Posteriors:

有討論著prior, likelihood, posterior

Re: which technique is being used to initialize weights for LSTM?
  • 中央層預設初使 the default affine-parameter initialization is mean 0 and standard deviation of 1/sqrt(input_dim).
  • 輸出就攏0
https://groups.google.com/d/msg/kaldi-help/3lGfMCoUwKY/I7buLqu-BgAJ
  • max-change驚調參數調過頭
  • 佮num-jobs、 learning-rates的關係
SGMM MMI

有介紹MMI的論文,MMI佇NN可能用會著

TDNN

Time delay neural network (TDNN)
local/nnet3/run_tdnn_discriminative.sh

LSTM discriminative training is increasing WER

資料無夠濟discriminative training無啥作用

Some questions about nnet3 discriminative training
Status docs vs online chain decoding

TDNN比BLSTM緊閣好,#2114已經merge

  • local/chain/run_tdnn.sh
  • wsj/s5/local/chain/tuning/run_tdnn_1f.sh有ubm
  • mini_librispeech/s5/local/chain/tuning/run_tdnn_1f.sh
The fmllr(LDA+MLLT)+ivector features were decoded with nnet2 model.

TDNN佮ivector的設定

Chain model

run_tdnn_multilingual.sh

multilingual 有tsiânn tsē output,ē-tàng 研究kah phoneme e kuan-he.

output-xent
Questions about TDNN+LSTM script

What is a "xent" branch means? why there are 2 output layers here? https://www.danielpovey.com/files/2016_interspeech_mmi.pdf (Sec 2.7.1)

Reg: Generate log posteriors using chain model

chain預設的output無正規化,閣愛加config

Is there a fundamental difference in likelihoods going from chain tdnn to chain tdnn_lstm?

chain models you are always supposed to use acoustic-scale=1.0

difference between downloaded ASpIRE and mine

有分析chain model的做法

what is chain.xent-regularize

dan這馬無佇output layer做l2-regularize,佇中央層做

chain model training

chain-Feed Forward is better than xent-Feed forward

kaldi linear Model Combination or Model Merging

If they are nnet3 models and they are using the same tree, you may be able to decode with steps/nnet3/decode_score_fusion.sh.

Separate Affine Layer for Chain Training Xent Regularization

講model 的參數收斂原理

RNN

recurrent neural network (RNN)佮以早孤向的 feedforward neural networks無仝

LSTM

LSTM是一種 RNN

SGD

Questions regarding parallel training with NSGD

ASGD: Averaged Stochastic Gradient Descent

鴻欣 更新的weight 平均

activation function

ReLU (Rectified Linear Unit)
SELU (Self-Normalization Neural Network)
ELU
Swish

About some new activation function?

可能資料少的時陣activation function有差,資料濟的時陣就無啥差

效果

Batchnorm performs poor than renorm during muli-SMBR training with TDNN?

End-to-end training in Kaldi

minibatch需要音檔平長

Why the reluGRU makes the kaldi crash?
  • 會造成發散,愛控制數值範圍
  • 這个物件效果可能無好(not useful)

Adaptation

Adaptation of chain models

Although I suppose if you had a mix of supervised and unsupervised data from the domain of interest, you could use the unsupervised part to help prevent the model straying too far.