# DNN-HMM ## DNN model - nnet3 * https://groups.google.com/forum/#!msg/kaldi-help/JFt7TtA6I1w/Aa9F1bLwAwAJ ### nnet3的xent是啥?查append >[name=ricer]Dnn nnet3 xent 是指 cross entropy >[name=sing5hong5]xconfig的文件要下什麼關鍵字去找,找不太到 >[name=ricer]看 fianl.config 裡面有建立。我想 kaldi 應該是參考 tensorflow 的 >[name=sing5hong5]好,我參考一下tensorflow和找final.config >[name=ricer]可以看 `src/nnet3/simple-component.cc`,裡面會介紹 每個component >Lstm 或 比較複雜的在`Nnet3-component.cc` ### P18 參數 >[name=ricer]DNN去看冊、google「十大問題」,看基礎 一般攏是tdnn fbank/mfcc 40維,前後5frame,攏總11frame 所以dnn的input有`40x11` 中央4層256抑是512的hidden layer 尾仔是3000个pdf state 若是愛抗噪,會當做256CNN+TDNN2層 lstm傷複雜,做久有經驗就會曉矣,穩定了才閣調,這馬無建議 >[name=鴻欣] high resolution : mfcc high freq ivector wsj/local/nnet3 tuning/tdnn 1a.sh hires : high resolution >[name=鴻欣]2018/09/06 印度kaldi發表 swbd/local/chain/tunning/run_tdd_7q.sh 做SVD,矩陣分解 1000x1000=> 1000 x100 , 100 x 1000 參數變少,層數更深(7->14) ##### [ --use-gpu=wait when using run.pl ](https://groups.google.com/d/msg/kaldi-help/oaOqkS0Cd_g/Xa0Tb3VACgAJ) 用`--use-gpu=wait`,就免因為GPU無夠濟,調job的數量,會自動等別的job做了 #### [ Train a monophone DNN ](https://groups.google.com/d/msg/kaldi-help/xaKt6eA9Uo4/yAgkx9nQAAAJ) `--context-opts "--context-width=1 --central-position=0" ` ##### [ Maximize GPU memory usage? ](https://groups.google.com/d/msg/kaldi-help/4lzsg8tf5OU/JT7yOlfDBAAJ) kaldikah別人比起來,GPU記憶體用誠省,因為in的硬體無啥記憶體 ##### [ Is there something like Stack Operation in nnet3 descriptor? ](https://groups.google.com/d/msg/kaldi-help/UTe7OxMztZM/eOZoas-qAwAJ) 佇nnet3內底,攏是vector,袂是maxtix ##### [ training end to end using CTC on custom data ](https://groups.google.com/d/msg/kaldi-help/nm2xplIz4P8/btJ-0vogBAAJ) end2end CTC的效果攏無好,所以kaldi無支援 ##### [ CMVN on librispeech corpus using Kaldi ](https://groups.google.com/d/msg/kaldi-help/9niauIHEFMU/YmRDt9mFAAAJ) nnet愛用dynamic range,愛先另外加一層batchnorm layer ##### [ Some questions about nnet3 discriminative training ](https://groups.google.com/d/msg/kaldi-help/G2qrBdCw6tA/CEexPk1XAQAJ) 解釋nnet3的senones維度、layout佮minibatch實際按怎操作。有講著frame rejection 的論文 ##### [nn training epochs, overfitting, and visualizing training process](https://groups.google.com/d/msg/kaldi-help/l8SNSqjPGqk/7xY-hDn1BAAJ) - 較少的資料需要較濟的epoch - 會當用`steps/nnet3/report/generate_plots.py`來顯示loss - [small random changed會佇nn造成大變化](https://groups.google.com/d/msg/kaldi-help/l8SNSqjPGqk/4X8sb5cyBQAJ),所以無用`early stopping` - 用train loss佮valid loss的倍數來判斷有overfitting無,[毋是用valid loss的數字](https://groups.google.com/d/msg/kaldi-help/l8SNSqjPGqk/9BAsqApEBQAJ) ##### [lre07 v2 train_dnn doubt](https://groups.google.com/d/msg/kaldi-help/JJ1Mu3Q9HRI/dgGQzTB2BQAJ) - 講著BNF佮LID,kaldi的程式有出現,毋過猶未查in是啥 ##### [Decoding using Posteriors:](https://groups.google.com/d/msg/kaldi-help/5Ljm90gt7KA/Kdtc3L03BQAJ) 有討論著prior, likelihood, posterior ##### [Re: which technique is being used to initialize weights for LSTM?](https://groups.google.com/d/msg/kaldi-help/1UQWsv9OxY0/ytVhRAQoBgAJ) - 中央層預設初使 the default affine-parameter initialization is mean 0 and standard deviation of 1/sqrt(input_dim). - 輸出就攏0 ##### [https://groups.google.com/d/msg/kaldi-help/3lGfMCoUwKY/I7buLqu-BgAJ](https://groups.google.com/d/msg/kaldi-help/3lGfMCoUwKY/I7buLqu-BgAJ) - max-change驚調參數調過頭 - 佮num-jobs、 learning-rates的關係 ##### [ SGMM MMI ](https://groups.google.com/d/msg/kaldi-help/NyB4-Dx2c9Q/153h4WfyAgAJ) 有介紹MMI的論文,MMI佇NN可能用會著 ### [TDNN](https://en.wikipedia.org/wiki/Time_delay_neural_network) Time delay neural network (TDNN) [local/nnet3/run_tdnn_discriminative.sh](https://github.com/kaldi-asr/kaldi/blob/master/egs/tedlium/s5/local/nnet3/run_tdnn_discriminative.sh) ##### [ LSTM discriminative training is increasing WER](https://groups.google.com/d/msg/kaldi-help/BsJKFHpke9U/1lb9POsvAQAJ) 資料無夠濟discriminative training無啥作用 ##### [ Some questions about nnet3 discriminative training ](https://groups.google.com/d/msg/kaldi-help/G2qrBdCw6tA/ixIVhBJbAAAJ) ##### [Status docs vs online chain decoding](https://groups.google.com/d/msg/kaldi-help/kDa9gSSZZn4/EtxjBgXVAAAJ) TDNN比BLSTM緊閣好,#2114已經merge - `local/chain/run_tdnn.sh` - `wsj/s5/local/chain/tuning/run_tdnn_1f.sh`有ubm - `mini_librispeech/s5/local/chain/tuning/run_tdnn_1f.sh` ##### [The fmllr(LDA+MLLT)+ivector features were decoded with nnet2 model.](https://groups.google.com/d/msg/kaldi-help/3vwaZyiKdtE/De6qlXLmAAAJ) TDNN佮ivector的設定 ### [Chain model](http://kaldi-asr.org/doc/chain.html) ##### [run_tdnn_multilingual.sh](https://github.com/kaldi-asr/kaldi/blob/master/egs/babel_multilang/s5/local/nnet3/run_tdnn_multilingual.sh) multilingual 有tsiânn tsē output,ē-tàng 研究kah phoneme e kuan-he. ##### [output-xent](https://groups.google.com/forum/#!msg/kaldi-help/sNMC1635WvY/D963e8-2CgAJ) ##### [Questions about TDNN+LSTM script](https://groups.google.com/d/msg/kaldi-help/uQI1OYu7dqE/PMirfWAmCwAJ) What is a "xent" branch means? why there are 2 output layers here? https://www.danielpovey.com/files/2016_interspeech_mmi.pdf (Sec 2.7.1) ##### [ Reg: Generate log posteriors using chain model](https://groups.google.com/d/msg/kaldi-help/X5-Qy_a8B6E/iqK8bQg3AQAJ) chain預設的output無正規化,閣愛加config ##### [ Is there a fundamental difference in likelihoods going from chain tdnn to chain tdnn_lstm? ](https://groups.google.com/d/msg/kaldi-help/IN1VGJSfmwo/DRkO67GvAgAJ) chain models you are always supposed to use acoustic-scale=1.0 ##### [difference between downloaded ASpIRE and mine](https://groups.google.com/d/msg/kaldi-help/UKdwflI2s4Y/gZ5Me45iAQAJ) 有分析chain model的做法 ##### [what is chain.xent-regularize](https://groups.google.com/d/msg/kaldi-help/6jkGQIuMj0o/nTpvr9MFAgAJ) dan這馬無佇output layer做l2-regularize,佇中央層做 ##### [chain model training](https://groups.google.com/d/msg/kaldi-help/ru4dz7XB2Rc/W9RPITpeBQAJ) chain-Feed Forward is better than xent-Feed forward ##### [ kaldi linear Model Combination or Model Merging ](https://groups.google.com/d/msg/kaldi-help/Z-iLS01_EVo/cKqDd2t3CAAJ) If they are nnet3 models and they are using the same tree, you may be able to decode with steps/nnet3/decode_score_fusion.sh. ##### [ Separate Affine Layer for Chain Training Xent Regularization ](https://groups.google.com/forum/#!msg/kaldi-help/bL6bkZCkutg/6yNvx-KOCQAJ) 講model 的參數收斂原理 ### [RNN](https://en.wikipedia.org/wiki/Recurrent_neural_network) recurrent neural network (RNN)佮以早孤向的 feedforward neural networks無仝 ### [LSTM](https://brohrer.mcknote.com/zh-Hant/how_machine_learning_works/how_rnns_lstm_work.html) LSTM是一種 RNN ### SGD ##### [ Questions regarding parallel training with NSGD ](https://groups.google.com/d/msg/kaldi-help/f27ajn_ewi8/ZfeIsA1pAgAJ) ASGD: Averaged Stochastic Gradient Descent >[name=鴻欣] 更新的weight 平均 #### activation function ReLU (Rectified Linear Unit) SELU (Self-Normalization Neural Network) ELU Swish ##### [About some new activation function? ](https://groups.google.com/d/msg/kaldi-help/RfgjLUXjWJg/V0R9QeCOAwAJ) 可能資料少的時陣activation function有差,資料濟的時陣就無啥差 ### 效果 [ Batchnorm performs poor than renorm during muli-SMBR training with TDNN?](https://groups.google.com/d/msg/kaldi-help/7jn7WSe6nXc/Wqx5OAjUDgAJ) #### [End-to-end training in Kaldi](https://groups.google.com/d/msg/kaldi-help/cQTQK5rMNz0/baL764MnAQAJ) minibatch需要音檔平長 ##### [Why the reluGRU makes the kaldi crash?](https://groups.google.com/d/msg/kaldi-help/6JNHxOrsbXw/4nMxh4MEAwAJ) - 會造成發散,愛控制數值範圍 - 這个物件效果可能無好(not useful) ### Adaptation ##### [Adaptation of chain models](https://groups.google.com/d/msg/kaldi-help/fjteIhBOUCc/J6U6-rKoBwAJ) Although I suppose if you had a mix of supervised and unsupervised data from the domain of interest, you could use the unsupervised part to help prevent the model straying too far.