ricerDnn nnet3 xent 是指 cross entropy
sing5hong5xconfig的文件要下什麼關鍵字去找,找不太到
ricer看 fianl.config 裡面有建立。我想 kaldi 應該是參考 tensorflow 的
sing5hong5好,我參考一下tensorflow和找final.config
ricer可以看src/nnet3/simple-component.cc
,裡面會介紹 每個component
Lstm 或 比較複雜的在Nnet3-component.cc
ricerDNN去看冊、google「十大問題」,看基礎
一般攏是tdnn
fbank/mfcc 40維,前後5frame,攏總11frame
所以dnn的input有40x11
中央4層256抑是512的hidden layer
尾仔是3000个pdf state
若是愛抗噪,會當做256CNN+TDNN2層
lstm傷複雜,做久有經驗就會曉矣,穩定了才閣調,這馬無建議
鴻欣
high resolution : mfcc high freq
ivector
wsj/local/nnet3
tuning/tdnn 1a.sh
hires : high resolution
鴻欣2018/09/06 印度kaldi發表
swbd/local/chain/tunning/run_tdd_7q.sh
做SVD,矩陣分解
1000x1000=> 1000 x100 , 100 x 1000
參數變少,層數更深(7->14)
用--use-gpu=wait
,就免因為GPU無夠濟,調job的數量,會自動等別的job做了
--context-opts "--context-width=1 --central-position=0"
kaldikah別人比起來,GPU記憶體用誠省,因為in的硬體無啥記憶體
佇nnet3內底,攏是vector,袂是maxtix
end2end CTC的效果攏無好,所以kaldi無支援
nnet愛用dynamic range,愛先另外加一層batchnorm layer
解釋nnet3的senones維度、layout佮minibatch實際按怎操作。有講著frame rejection 的論文
steps/nnet3/report/generate_plots.py
來顯示lossearly stopping
有討論著prior, likelihood, posterior
有介紹MMI的論文,MMI佇NN可能用會著
Time delay neural network (TDNN)
local/nnet3/run_tdnn_discriminative.sh
資料無夠濟discriminative training無啥作用
TDNN比BLSTM緊閣好,#2114已經merge
local/chain/run_tdnn.sh
wsj/s5/local/chain/tuning/run_tdnn_1f.sh
有ubmmini_librispeech/s5/local/chain/tuning/run_tdnn_1f.sh
TDNN佮ivector的設定
multilingual 有tsiânn tsē output,ē-tàng 研究kah phoneme e kuan-he.
What is a "xent" branch means? why there are 2 output layers here? https://www.danielpovey.com/files/2016_interspeech_mmi.pdf (Sec 2.7.1)
chain預設的output無正規化,閣愛加config
chain models you are always supposed to use acoustic-scale=1.0
有分析chain model的做法
dan這馬無佇output layer做l2-regularize,佇中央層做
chain-Feed Forward is better than xent-Feed forward
If they are nnet3 models and they are using the same tree, you may be able to decode with steps/nnet3/decode_score_fusion.sh.
講model 的參數收斂原理
recurrent neural network (RNN)佮以早孤向的 feedforward neural networks無仝
LSTM是一種 RNN
ASGD: Averaged Stochastic Gradient Descent
鴻欣 更新的weight 平均
ReLU (Rectified Linear Unit)
SELU (Self-Normalization Neural Network)
ELU
Swish
可能資料少的時陣activation function有差,資料濟的時陣就無啥差
Batchnorm performs poor than renorm during muli-SMBR training with TDNN?
minibatch需要音檔平長
Although I suppose if you had a mix of supervised and unsupervised data from the domain of interest, you could use the unsupervised part to help prevent the model straying too far.