# DNN-HMM
## DNN model - nnet3
* https://groups.google.com/forum/#!msg/kaldi-help/JFt7TtA6I1w/Aa9F1bLwAwAJ
### nnet3的xent是啥?查append
>[name=ricer]Dnn nnet3 xent 是指 cross entropy
>[name=sing5hong5]xconfig的文件要下什麼關鍵字去找,找不太到
>[name=ricer]看 fianl.config 裡面有建立。我想 kaldi 應該是參考 tensorflow 的
>[name=sing5hong5]好,我參考一下tensorflow和找final.config
>[name=ricer]可以看 `src/nnet3/simple-component.cc`,裡面會介紹 每個component
>Lstm 或 比較複雜的在`Nnet3-component.cc`
### P18 參數
>[name=ricer]DNN去看冊、google「十大問題」,看基礎
一般攏是tdnn
fbank/mfcc 40維,前後5frame,攏總11frame
所以dnn的input有`40x11`
中央4層256抑是512的hidden layer
尾仔是3000个pdf state
若是愛抗噪,會當做256CNN+TDNN2層
lstm傷複雜,做久有經驗就會曉矣,穩定了才閣調,這馬無建議
>[name=鴻欣]
high resolution : mfcc high freq
ivector
wsj/local/nnet3
tuning/tdnn 1a.sh
hires : high resolution
>[name=鴻欣]2018/09/06 印度kaldi發表
swbd/local/chain/tunning/run_tdd_7q.sh
做SVD,矩陣分解
1000x1000=> 1000 x100 , 100 x 1000
參數變少,層數更深(7->14)
##### [ --use-gpu=wait when using run.pl ](https://groups.google.com/d/msg/kaldi-help/oaOqkS0Cd_g/Xa0Tb3VACgAJ)
用`--use-gpu=wait`,就免因為GPU無夠濟,調job的數量,會自動等別的job做了
#### [ Train a monophone DNN ](https://groups.google.com/d/msg/kaldi-help/xaKt6eA9Uo4/yAgkx9nQAAAJ)
`--context-opts "--context-width=1 --central-position=0" `
##### [ Maximize GPU memory usage? ](https://groups.google.com/d/msg/kaldi-help/4lzsg8tf5OU/JT7yOlfDBAAJ)
kaldikah別人比起來,GPU記憶體用誠省,因為in的硬體無啥記憶體
##### [ Is there something like Stack Operation in nnet3 descriptor? ](https://groups.google.com/d/msg/kaldi-help/UTe7OxMztZM/eOZoas-qAwAJ)
佇nnet3內底,攏是vector,袂是maxtix
##### [ training end to end using CTC on custom data ](https://groups.google.com/d/msg/kaldi-help/nm2xplIz4P8/btJ-0vogBAAJ)
end2end CTC的效果攏無好,所以kaldi無支援
##### [ CMVN on librispeech corpus using Kaldi ](https://groups.google.com/d/msg/kaldi-help/9niauIHEFMU/YmRDt9mFAAAJ)
nnet愛用dynamic range,愛先另外加一層batchnorm layer
##### [ Some questions about nnet3 discriminative training ](https://groups.google.com/d/msg/kaldi-help/G2qrBdCw6tA/CEexPk1XAQAJ)
解釋nnet3的senones維度、layout佮minibatch實際按怎操作。有講著frame rejection 的論文
##### [nn training epochs, overfitting, and visualizing training process](https://groups.google.com/d/msg/kaldi-help/l8SNSqjPGqk/7xY-hDn1BAAJ)
- 較少的資料需要較濟的epoch
- 會當用`steps/nnet3/report/generate_plots.py`來顯示loss
- [small random changed會佇nn造成大變化](https://groups.google.com/d/msg/kaldi-help/l8SNSqjPGqk/4X8sb5cyBQAJ),所以無用`early stopping`
- 用train loss佮valid loss的倍數來判斷有overfitting無,[毋是用valid loss的數字](https://groups.google.com/d/msg/kaldi-help/l8SNSqjPGqk/9BAsqApEBQAJ)
##### [lre07 v2 train_dnn doubt](https://groups.google.com/d/msg/kaldi-help/JJ1Mu3Q9HRI/dgGQzTB2BQAJ)
- 講著BNF佮LID,kaldi的程式有出現,毋過猶未查in是啥
##### [Decoding using Posteriors:](https://groups.google.com/d/msg/kaldi-help/5Ljm90gt7KA/Kdtc3L03BQAJ)
有討論著prior, likelihood, posterior
##### [Re: which technique is being used to initialize weights for LSTM?](https://groups.google.com/d/msg/kaldi-help/1UQWsv9OxY0/ytVhRAQoBgAJ)
- 中央層預設初使 the default affine-parameter initialization is mean 0 and standard deviation of 1/sqrt(input_dim).
- 輸出就攏0
##### [https://groups.google.com/d/msg/kaldi-help/3lGfMCoUwKY/I7buLqu-BgAJ](https://groups.google.com/d/msg/kaldi-help/3lGfMCoUwKY/I7buLqu-BgAJ)
- max-change驚調參數調過頭
- 佮num-jobs、 learning-rates的關係
##### [ SGMM MMI ](https://groups.google.com/d/msg/kaldi-help/NyB4-Dx2c9Q/153h4WfyAgAJ)
有介紹MMI的論文,MMI佇NN可能用會著
### [TDNN](https://en.wikipedia.org/wiki/Time_delay_neural_network)
Time delay neural network (TDNN)
[local/nnet3/run_tdnn_discriminative.sh](https://github.com/kaldi-asr/kaldi/blob/master/egs/tedlium/s5/local/nnet3/run_tdnn_discriminative.sh)
##### [ LSTM discriminative training is increasing WER](https://groups.google.com/d/msg/kaldi-help/BsJKFHpke9U/1lb9POsvAQAJ)
資料無夠濟discriminative training無啥作用
##### [ Some questions about nnet3 discriminative training ](https://groups.google.com/d/msg/kaldi-help/G2qrBdCw6tA/ixIVhBJbAAAJ)
##### [Status docs vs online chain decoding](https://groups.google.com/d/msg/kaldi-help/kDa9gSSZZn4/EtxjBgXVAAAJ)
TDNN比BLSTM緊閣好,#2114已經merge
- `local/chain/run_tdnn.sh`
- `wsj/s5/local/chain/tuning/run_tdnn_1f.sh`有ubm
- `mini_librispeech/s5/local/chain/tuning/run_tdnn_1f.sh`
##### [The fmllr(LDA+MLLT)+ivector features were decoded with nnet2 model.](https://groups.google.com/d/msg/kaldi-help/3vwaZyiKdtE/De6qlXLmAAAJ)
TDNN佮ivector的設定
### [Chain model](http://kaldi-asr.org/doc/chain.html)
##### [run_tdnn_multilingual.sh](https://github.com/kaldi-asr/kaldi/blob/master/egs/babel_multilang/s5/local/nnet3/run_tdnn_multilingual.sh)
multilingual 有tsiânn tsē output,ē-tàng 研究kah phoneme e kuan-he.
##### [output-xent](https://groups.google.com/forum/#!msg/kaldi-help/sNMC1635WvY/D963e8-2CgAJ)
##### [Questions about TDNN+LSTM script](https://groups.google.com/d/msg/kaldi-help/uQI1OYu7dqE/PMirfWAmCwAJ)
What is a "xent" branch means? why there are 2 output layers here? https://www.danielpovey.com/files/2016_interspeech_mmi.pdf (Sec 2.7.1)
##### [ Reg: Generate log posteriors using chain model](https://groups.google.com/d/msg/kaldi-help/X5-Qy_a8B6E/iqK8bQg3AQAJ)
chain預設的output無正規化,閣愛加config
##### [ Is there a fundamental difference in likelihoods going from chain tdnn to chain tdnn_lstm? ](https://groups.google.com/d/msg/kaldi-help/IN1VGJSfmwo/DRkO67GvAgAJ)
chain models you are always supposed to use acoustic-scale=1.0
##### [difference between downloaded ASpIRE and mine](https://groups.google.com/d/msg/kaldi-help/UKdwflI2s4Y/gZ5Me45iAQAJ)
有分析chain model的做法
##### [what is chain.xent-regularize](https://groups.google.com/d/msg/kaldi-help/6jkGQIuMj0o/nTpvr9MFAgAJ)
dan這馬無佇output layer做l2-regularize,佇中央層做
##### [chain model training](https://groups.google.com/d/msg/kaldi-help/ru4dz7XB2Rc/W9RPITpeBQAJ)
chain-Feed Forward is better than xent-Feed forward
##### [ kaldi linear Model Combination or Model Merging ](https://groups.google.com/d/msg/kaldi-help/Z-iLS01_EVo/cKqDd2t3CAAJ)
If they are nnet3 models and they are using the same tree, you may be able to decode with steps/nnet3/decode_score_fusion.sh.
##### [ Separate Affine Layer for Chain Training Xent Regularization ](https://groups.google.com/forum/#!msg/kaldi-help/bL6bkZCkutg/6yNvx-KOCQAJ)
講model 的參數收斂原理
### [RNN](https://en.wikipedia.org/wiki/Recurrent_neural_network)
recurrent neural network (RNN)佮以早孤向的 feedforward neural networks無仝
### [LSTM](https://brohrer.mcknote.com/zh-Hant/how_machine_learning_works/how_rnns_lstm_work.html)
LSTM是一種 RNN
### SGD
##### [ Questions regarding parallel training with NSGD ](https://groups.google.com/d/msg/kaldi-help/f27ajn_ewi8/ZfeIsA1pAgAJ)
ASGD: Averaged Stochastic Gradient Descent
>[name=鴻欣] 更新的weight 平均
#### activation function
ReLU (Rectified Linear Unit)
SELU (Self-Normalization Neural Network)
ELU
Swish
##### [About some new activation function? ](https://groups.google.com/d/msg/kaldi-help/RfgjLUXjWJg/V0R9QeCOAwAJ)
可能資料少的時陣activation function有差,資料濟的時陣就無啥差
### 效果
[ Batchnorm performs poor than renorm during muli-SMBR training with TDNN?](https://groups.google.com/d/msg/kaldi-help/7jn7WSe6nXc/Wqx5OAjUDgAJ)
#### [End-to-end training in Kaldi](https://groups.google.com/d/msg/kaldi-help/cQTQK5rMNz0/baL764MnAQAJ)
minibatch需要音檔平長
##### [Why the reluGRU makes the kaldi crash?](https://groups.google.com/d/msg/kaldi-help/6JNHxOrsbXw/4nMxh4MEAwAJ)
- 會造成發散,愛控制數值範圍
- 這个物件效果可能無好(not useful)
### Adaptation
##### [Adaptation of chain models](https://groups.google.com/d/msg/kaldi-help/fjteIhBOUCc/J6U6-rKoBwAJ)
Although I suppose if you had a mix of supervised and unsupervised data from the domain of interest, you could use the unsupervised part to help prevent the model straying too far.