琳山專題 - HackMD

--- title: 琳山專題 tags: 專題 --- ## 琳山專題蔡易霖, 宋岩叡 ## 第一周（二） ### 熟悉 pytorch 架構 - [ ] 用 pytorch 疊出一個兩到三層的 neural network 去作 mnist task. >accuracy 到 90%以上即可 >紀錄一下 loss curve在訓練時候的下降變化 -- 用tensorboardX.SummaryWriter下面ㄉfunction >紀錄一下 accuracy curve 在訓練時候的上升變化 -- 用 tensorboardX.SummaryWriter下面ㄉfunction >畫 confusion Matrix (heatmap) Bonus: 可以使用 tsne(t-sne) 或是 pca (principle component analysis)降維去 visualize 模型的行為可以網路搜尋一下 tsne 或是 pca 這兩個名詞 https://distill.pub/2016/misread-tsne/ ### 看宏毅 2016 Deep Learning 課程 - [ ] 從頭看到 backpropogation 那一個投影片 (12) ### Paper Pool (以防 paper 看起來太艱深, 可以先看宏毅 machine learning lecture 21-1, 21-2 了解一下 RNN) - [ ] [Sequence to Sequence Learning with Neural Networks - NIPS 2014](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf) - [ ] [Neural Machine Translation by Jointly Learning to Align and Translate - ICLR 2015](https://arxiv.org/pdf/1409.0473.pdf) ### 文字在機器上的表示補充到底一個文字在模型上是怎麼表示的呢以及它們彼此的關係是什麼呢: [Distributed Representations of Words and Phrases and their Compositionality -NIPS 2014](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) 懶人包: [Introduction to Word Embedding and Word2Vec](https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa) [word2vec explain](https://arxiv.org/pdf/1411.2738.pdf) ### 教學資源 **cs224d, cs229, cs231n, 宏毅 MLDS, ML** 歷史演進: cs229 -> cs231n,cs224d,宏毅 ml -> 宏毅 MLDS 難易度: cs224d(for NLP) = cs231n(for Computer Vision) > 宏毅 mlds > cs229 > 宏毅 ml ---------------------------------------------- ## 11/23 (第三周)（六） [cs224N language model & seq2seq model overview](https://www.youtube.com/watch?v=iWea12EAu6U&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z&index=6) cs224N Lecture 6, 7, 8 (winter 2019) above lectures are the knowledge about language model and RNN detail for NLP, so i think you may need to watch videoes. i will push a rnn task code until thursday. but you may need to have some background knowledge for model(RNN,LSTM, SEQ2SEQ ). ## 11/26 (第四周)（二） watch lecture 7,8 12,13 markdown word vector 乘以w變context vector context vector 乘以u 變成下一個字出現的機率分布!(lecture 6 19 分處)但是這樣first column只跟e1有關 second column 只跟e2有關，我們希望可以contextual，所以用24分處的方法 24分處左邊寫得頗清楚 ## 12/1 (日） follow up the steps of [pytorch-seq2seq](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html), build your seq2seq2 model. ## 12/10(二) 可以改的地方：teacher ratio, leaning rate 用LSTM 要不要attention 近三年paper有關nmt 不同語言的差別 cs224 lecture 1,2,3 ### visualize tool tenserboardx, Pytorch的tenserboard wandb, comet option: transformer paper CTRL bart 看一下gpt2 model ## 12/17(Tue) P19: 意思相同的字，左右鄰居會很像，所以我可以用左右鄰居來預測這個字是誰 P22 self supervised: LOSS function 我們給定中間的字來計算左右兩邊的字出現機率，機率取log後相加除T加負號，我們希望train 中間weight matrix，假設一開始的one hot verctor是100維，weight martrix是100乘以20，不同的字都只有一個row值是1，其他都是0，所以不同字之間是orthongonal的，train完之後我們把center word的word vector乘以weight martrix會得到比較dense的20維vector，然後再乘以一個20*乘以00的matrix把vector還原成100維，這個100維的vector就是每個字出現在center word周圍的機率 p26: softmax:我們最後得到100維的vector後，我們希望每一維的值對應到每個字出現在center word旁的機率，但是這個vector所有值的總和不一定是1，所以我們要用softmax來normalize，我們把o這個字對應載vector維度的值取exponential後，在除以這個vector所有維度取exponential後的總合，就是這個字出現在center word周圍的機率 p32: skip gram(SG):用中間的字預測左右兩邊的字的機率 continuous bags of word(CBOW):用左右兩邊的字預測中間字 ### seq2seq: 把eveluate的pairs跟training data set分開 testing loss 去看有沒有overfitting bleu score