# RNN 待研究 ## 10/21 RNN 演變成 LSTM 的理由(RNN 哪裡不好) back propogation through time 詳細運作 數學式 LSTM 又為啥演變成 GRU,哪裡比較好 做 peephole connection 理由,適用哪些時候 LSTM vs GRU 表格 實作程式調參 LSTM unit 50是啥 ## 10/28 ### Hsuanchia: GRU GRU vs LSTM vs RNN-Simple ### Jiazheng: Batch Normalization / Layer Normalization Textbook Chapter / Codes: Forecasting a Time Series WaveNet (Optional) [today's ppt](https://drive.google.com/file/d/1Jso9x9biDEakXGnl-fzToFcMSLNBWehF/view?usp=sharing) [Chatq](https://ckip.iis.sinica.edu.tw/project/chatq/) ## 11/04 Jiazheng: [本次簡報](https://docs.google.com/presentation/d/1NBVktVukT6vEzW0261buU9pRpCOLDelOnlRLdhAgjPw/edit?usp=sharing) - return_sequence=True和預設False的差別? 如果False是不是20個time-series的跑完再丟給下個Layer嗎? 如果True的話是不是前20個timestamp的state都有傳達給下一Layer? 以上圖解 算參數看有無差 Discover: - multi-layer RNN cannot run without return_sequence - TimeDistributed Layer 之後會很常用到 Hsuanchia: GRU: reset_after real meaning(why 60 more parameter than reset_before?) and its topology [today's ppt](https://drive.google.com/file/d/1z90awR8zdMzeyDb8A2kOHBRNBb0lBWaI/view?usp=sharing) ## 11/11 [本次簡報](https://docs.google.com/presentation/d/1NBVktVukT6vEzW0261buU9pRpCOLDelOnlRLdhAgjPw/edit?usp=sharing)(直接拿11/04的改) 讀paper [Kelvin Xu et al., “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention,”Proceedings of the 32nd International Conference on Machine Learning (2015): 2048–2057.](https://arxiv.org/pdf/1502.03044.pdf) 分頭研究LSTM/Attention和CNN部份 ## 11/18 上面那篇的attention是2014年之前的,attention is all you need 2017比他晚,有不一樣? ## 11/25 Attention先備知識: 1. seq2seq: created shakespearean with character RNN(textbook) * word embedding v.s. one-hot vector * How's tokenization work? * Stateful RNN 2. sentiment RNN(textbook) * embedded layer 3. Bidirection RNN 4. Beam search algorithm 5. Attention * soft, hard, global, local * contex vector 6. paper: Visual attention(textbook) 7. paper: Attention is all you need Jiazheng: 1 2 Hsuanchia: 3 4 5 6 optional: 7 ## 12/02 定題目:12/08 before ML classes 發表會:12/23 or 12/25 judge: 陳履恆、陳依蓉、黃育銘 Ref: [Standford ML project](http://cs229.stanford.edu/projects.html) Both: See if there is anything we can use on stanford's project Find training data that we can use Imagine what the CNN and RNN architecture suppose to be Jiazheng: last week's todo 3~6 Hsuanchia: 12/23 on stage presentation(10 min) ## 12/09 Jiazheng: visual attention's experiment(decorder) by using GRU model + soft attention Hsuanchia: visual attention's experiment(decorder) by using LSTM model + soft attention Presentation's date change to 2021/01/06 or 08 optional: BLEU evaluation [Experiment by using tensorflow](https://github.com/yunjey/show-attend-and-tell) ## 12/16 Jiazheng: Textbook Sentiment Analysis How paper's LSTM hidden state(means probability?) turned into words and sentences :::warning Deadline: 12/21 should told me what you have done or learned ::: Hsuanchia: Visual attention's experiment Presentation keypoint: Towards the goal, what's most related? ## 12/23 Word bank: build the word bank of whole 118k image * 5 annotation sentence before word bank we need to do word embedded LSTM: 196 * 512 input It shall be 512 timesteps [Experiment by using Keras](https://github.com/zimmerrol/show-attend-and-tell-keras) ## 12/30 Presentation Rehearsal 每個timestamp focus 在一個 word? Validation dataset有沒有幫我們做stratified sampling? Input image resolution? there's preprocessing MS COCO is a abbr, full name? VGGNet Filter size? Stride? 是到哪層停住? Pooling和Convolution的全部參數 Project Structure示意圖更low level,像課本Figure 16-8 Stateful or Stateless? 五句句子之間要有關聯 P.11公式表現方法 Portable / Capability How's our project differs from refference paper? Performance Enhancemant / use YOLO instead of VGG 未來角度:新架構,encoder就已經是classification完,例如圖片中有哪些物件,multi-label multi-class;decoder改成GRU或Bidirectional(?)+Attention或deep RNN Transfer Learning: we use VGGNet pretrained on ImageNet(object detection), use COCO to train more Bi-directional cannot be used on image encoder CNN, how about decoder RNN? 2021/01/04 20:00 Meet again: 1. 大家看簡報哪裡還能再改 2. Hsuanchia 計時講一次 3. 猜評審會問啥問題 4. (Optional) 其他人用自己方式講一次 ## 2021/01/05 用學校Google帳號才能看到 [雅婷逐字稿文字檔](https://drive.google.com/file/d/1z65L5HKzvod1eMm58cTMlmhVi-amT9XA/view?usp=sharing "Google Drive NCNU") [錄音檔](https://drive.google.com/file/d/1z_CIjLq3Gyb1HtvpXJx8xOzsToU7BmMN/view?usp=sharing "Google Drive NCNU") ## 01/06 老師回饋 履恆說可以去看看NLP language model 履恆說他有看到引用這篇visual attention的paper,在自動生成詩句,或許我們也能看看後來人怎麼應用這篇paper 依蓉說YOLO看我們要整個架構引用,還是用一部分 attention機制不知道在attention三小,這件事我們到底需不需要知道 Jiazheng: 想分工 * [A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning](https://dl.acm.org/doi/pdf/10.1145/1390156.1390177) * [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/pdf/1801.06146.pdf) * [CSPNet](https://arxiv.org/pdf/1911.11929.pdf) * [image caption with full VGGNet](https://arxiv.org/pdf/1411.4555.pdf) ## 01/13 * image caption * [Generate poetry from image](https://arxiv.org/pdf/1804.08473.pdf) * conference (CVGIP?) * NLP language model * [100 must-read NLP paper](https://github.com/mhagiwara/100-nlp-papers) * Decoder implementation * YOLO ## 01/26 * show-and-tell implementations * [Tenserflow version with pre-train model(use MSCOCO 2014 dataset)](https://github.com/coldmanck/show-attend-and-tell) * [Keras version](https://github.com/201528014227051/show_attend_and_tell.keras) * [Keras version](https://github.com/zimmerrol/show-attend-and-tell-keras) * [Tensorflow ver.](https://github.com/tensorflow/models/tree/archive/research/im2txt) by Google employees, use Inception v3 encoder * [PyTorch ver.](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning) ## 01/27 新聞:[Image caption 在 Facebook 上的應用](https://chinese.engadget.com/facebook-explains-ai-alt-text-for-photo-descriptions-070013576.html),英文原文: [How Facebook is using AI to improve photo descriptions for people who are blind or visually impaired](https://ai.facebook.com/blog/how-facebook-is-using-ai-to-improve-photo-descriptions-for-people-who-are-blind-or-visually-impaired/ "How Facebook is using AI to improve photo descriptions for people who are blind or visually impaired"),在Firefox瀏覽器到FB首頁->按右鍵->檢測輔助功能環境屬性(Inspect Accessibility Properties),在動態流上圖片停留就能看到FB幫圖片加的註解了 #### 跑 [pretrained model tensorflow ver.](https://github.com/coldmanck/show-attend-and-tell) 實驗方法 by Jiazheng: 根據 Git repo history 時間推斷,當時作者是用 Tensorflow 1.8.0 左右版本寫的,而現在已經 Tensorflow 2.x 版了會太新;又 [Tensorflow 1.8.0 的 pip](https://pypi.org/project/tensorflow/1.8.0/#files) 只有支援到 Python 3.6,而現在 Ubuntu 20.04 內建是 3.8 會太新,需要架個舊版的環境。 1. 安裝 Conda,一個方便切換和安裝不同版本的 Python 和相關套件的工具 官方教學: https://docs.anaconda.com/anaconda/install/linux/ 2. 如果照預設的裝完,每次新開 terminal 都會先進它的 base 環境,要拒絕這件事的話 `conda config --set auto_activate_base false` 3. 建立新環境,`new_env` 是名字能自己取,並指定舊版 Python 版本 `conda create -n new_env python=3.6` 4. 進入自己建的環境,command line 每一行一開頭應該會多個`(new_env)` `conda activate new_env` 5. 安裝指定版本的 Tensorflow、Numpy `pip install tensorflow==1.8.0` `pip install numpy==1.16.2 ` 6. 安裝其他需要的函式庫 ```zsh pip install scipy matplotlib ipython \ jupyter pandas sympy nose opencv-python nltk pandas \ tqdm scikit-image ``` 7. 把 Git repo README 的 pretrained model with default configuration can be downloaded here 下載下來的解壓縮,289999.npy 檔案放到 models 資料夾裡、vocabulary.csv 放在程式主目錄 8. 把你想實驗的 .jpg 圖片放進`test/images` 9. 跑 inference 實驗 ```bash python main.py --phase=test \ --model_file='./models/289999.npy' \ --beam_size=3 ``` 9. 有報錯`SyntaxError: Missing parentheses in call to 'print'.`的話,去`utils/coco/coco.py` 把每個有 print 的地方加上左右括弧,[像這樣](https://termbin.com/9lpn) 10. 結果會輸出在`test/results` 11. 離開 Conda 所建環境的方法,或也可以關掉這視窗再打開新的啦 `conda deactivate` ## 01/28 * Encoder 1. CSPnet 2. Classification in encoder 3. stuff segmentation * Decoder * 想辦法重現現在paper的那個decoder再來談 * 之後設計可以配合其他種encoder的decoder, 像classify 之後再丟進decoder做字句生成 * All need to do: Image caption's paper survey * NLP word vector pre-train model: [GloVe](https://nlp.stanford.edu/projects/glove/) * [Using pre-trained word embeddings in a Keras model](https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html) ## 02/10 * Encoder 1. CSPnet 2. Classification in encoder 3. stuff segmentation * Decoder 1. 解決 decoder overfitting 2. performance 評估 3. 準確率提升 * All need to do: Image caption's paper survey * [hsuanchia's decoder](https://colab.research.google.com/drive/1wWPK5wWA4jSW97xu0Un7ohLdeO0wJUIy#scrollTo=Cbf2FTiFFLpX) ## 02/21 * Hsuanchia * [My decoder](https://colab.research.google.com/drive/1wWPK5wWA4jSW97xu0Un7ohLdeO0wJUIy#scrollTo=Cbf2FTiFFLpX) * decoder with attention * [tensorflow's image caption](https://www.tensorflow.org/tutorials/text/image_captioning#model) * decoder without attention * [What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?](https://arxiv.org/abs/1708.02043) * [Paper's code](https://github.com/mtanti/rnn-role) ![](https://i.imgur.com/zm3dheC.png) * image caption's paper survey * [Generate poetry from image](https://arxiv.org/pdf/1804.08473.pdf) (not finish) * [Image Captioning with Semantic Attention](https://arxiv.org/pdf/1603.03925.pdf) (not finish) * [Paper description (chinese)](https://blog.csdn.net/sinat_26253653/article/details/78260985) * [Image caption development (chinese)](https://zhuanlan.zhihu.com/p/30314440) * Topic need to keep survey * Attention mechanism * Is it really work to the caption? * [Adaptive attention](https://arxiv.org/pdf/1612.01887) * [github](https://github.com/jiasenlu/AdaptiveAttention) * Beam search * To improve the performance * BLEU and METEOR * Or something which can judge model's prediction * [State-of-the-art of image caption](https://paperswithcode.com/task/image-captioning) * [OSCAR - 2020's state-of-the-art](https://github.com/microsoft/Oscar) * [Azure Florence – Vision and Language ](https://www.microsoft.com/en-us/research/project/azure-florence-vision-and-language/) * [Unified VLP - 2019's state-of-the-art](https://arxiv.org/pdf/1909.11059v3.pdf) * [Adative attention](https://arxiv.org/pdf/1612.01887.pdf) * [What Value Do Explicit High Level Concepts Have in Vision to Language Problems?](https://arxiv.org/pdf/1506.01144.pdf) * [Image Captioning with Semantic Attention](https://arxiv.org/pdf/1603.03925.pdf) * [show-and-tell - 2015's state-of-the-art](https://arxiv.org/pdf/1502.03044.pdf) * [Stanford NLP group](https://nlp.stanford.edu/software/) * [MSCOCO caption evaluation tool](https://github.com/tylin/coco-caption) * [Image caption's paper](https://blog.csdn.net/JohnChen45/article/details/81748651) ## 02/24 * Encoder * YOLOv1 ~ YOLOv3 * implement YOLOv3 * [Adative attention](https://arxiv.org/pdf/1612.01887.pdf) * [github](https://github.com/jiasenlu/AdaptiveAttention) * [Paper description (Chinese)](https://blog.csdn.net/sinat_26253653/article/details/79416234) * Decoder * [What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?](https://arxiv.org/abs/1708.02043) * [Paper's code](https://github.com/mtanti/rnn-role) * [Image Captioning with Semantic Attention](https://arxiv.org/pdf/1603.03925.pdf) (not finish) * [Paper description (Chinese)](https://blog.csdn.net/sinat_26253653/article/details/78260985) * [Adative attention](https://arxiv.org/pdf/1612.01887.pdf) * [github](https://github.com/jiasenlu/AdaptiveAttention) * [Paper description (Chinese)](https://blog.csdn.net/sinat_26253653/article/details/79416234) * [What Value Do Explicit High Level Concepts Have in Vision to Language Problems?](https://arxiv.org/pdf/1506.01144.pdf) * [image caption's important paper (chinese)](https://blog.csdn.net/sinat_35177634/article/details/88102512) * VQA (Visual Question Answering) ## 03/03 * Encoder * FCN * YOLO * YOLO pre-trained model * Decoder * Template-based paper * FCN * [OSCAR - 2020's state-of-the-art](https://arxiv.org/pdf/2004.06165v5.pdf) * [Paper description (Chinese)](https://blog.csdn.net/c9Yv2cf9I06K2A9E/article/details/106270568?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522161509995516780264054709%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=161509995516780264054709&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~baidu_landing_v2~default-1-106270568.pc_search_result_before_js&utm_term=oscar+image+caption) * [Neural baby talk](https://arxiv.org/pdf/1803.09845.pdf) * [Pointer Network (video)](https://www.youtube.com/watch?v=VdOyqNQ9aww) * [R-CNN (chinese)](https://ivan-eng-murmur.medium.com/object-detection-s1-rcnn-%E7%B0%A1%E4%BB%8B-30091ca8ef36) * [Fast R-CNN (chinese)](https://ivan-eng-murmur.medium.com/obeject-detection-s2-fast-rcnn-%E7%B0%A1%E4%BB%8B-40cfe7b5f605) * [Faster R-CNN (chinese)](https://ivan-eng-murmur.medium.com/object-detection-s3-faster-rcnn-%E7%B0%A1%E4%BB%8B-5f37b13ccdd2) * Start some experiment ## 03/10 * Encoder * YOLO * R-CNN * Decoder * Adaptive Loss (hsuanchia) * Neural Baby Talk (jiazheng) * archtecture * loss * [Other's PPT](https://www.cs.ubc.ca/~lsigal/532S_2018W2/1a.pdf) * Show-and-tell code(hsuanchia) ## 03/17 * Encoder * Fast RCNN * YOLOv2 * Decoder * Neural Baby Talk (jiazheng) * archtecture * loss * [Other's PPT](https://www.cs.ubc.ca/~lsigal/532S_2018W2/1a.pdf) * [coco-caption](https://github.com/tylin/coco-caption) * Show-and-tell code(hsuanchia) ## 03/24 * Encoder * YOLOv3 * YOLO9000 * Decoder * NBT & Adaptive & Show-and-tell loss survey(hsuanchia) * Show-and-tell code(hsuanchia) * VQA survey(hsuanchia) * [introdution(Chinese)](https://franky07724-57962.medium.com/%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92%E5%92%8C%E5%9C%96%E5%83%8F%E5%95%8F%E7%AD%94-aee3730d9fbc) * [CLEVR dataset](https://cs.stanford.edu/people/jcjohns/clevr/) * [DAQUAR dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/vision-and-language/visual-turing-challenge/) * [COCO-QA](http://www.cs.toronto.edu/~mren/research/imageqa/data/cocoqa/) * [VQA dataset](https://visualqa.org/index.html) * RNN-injection survey(hsuanchia) * [coco-caption](https://github.com/tylin/coco-caption)(jiazheng) * CIDER, Meteor, SPICE (jiazheng) * [CIDEr](https://arxiv.org/pdf/1411.5726.pdf) and [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) * [METEOR Wikipedia](https://en.wikipedia.org/wiki/METEOR) * [Image Caption 常用评价指标 簡書](https://www.jianshu.com/p/60deff0f64e1) * [[Image Caption学习]NLP常见评价指标 知乎](https://zhuanlan.zhihu.com/p/160988416) * [Towards image captioning and evaluation](https://www.cs.princeton.edu/courses/archive/spring18/cos598B/public/outline/Towards%20image%20captioning.pdf) * 從[這裡](https://blog.csdn.net/u013548453/article/details/79244007)發現了 Stanford 有一門課[作業](https://cs231n.github.io/assignments2016/assignment3/)就在做 Image Caption ## 04/07 [Evaluating Image Caption Presentation](https://docs.google.com/presentation/d/1ZLT7pJTuvB6q0w2M-wxKOvlV-HVmW6mQXaBtirDjT7M/edit?usp=sharing) Language model has huge progress, loss: 30000 -> 3, by tuning step_per_epoch. * To-do coding * [CNN + RNN model(No attention)](https://colab.research.google.com/drive/12mxdrrN3oKeOQn7ml8Fwx3CurQaFtdYy) * add and concatenate * injection model * merge model * epochs tuning * LSTM, GRU, simpleRNN * Change CNN's arch * Optimal loss function * checkpoint * visual attention(show-and-tell) * soft attention * local attention * global attention * Encoder with all VGG16's layer * Use classification's result as decoder's input * Sementic attention model * adaptive attention model * Neural baby talk model * research * loss function for every model we have discuss before ## 04/21 * Coding!!! * [CNN + RNN model(No attention)](https://colab.research.google.com/drive/12mxdrrN3oKeOQn7ml8Fwx3CurQaFtdYy) * add and concatenate * injection model * merge model * epochs tuning * LSTM, GRU, simpleRNN * Change CNN's arch * Optimal loss function * checkpoint * visual attention(show-and-tell) * soft attention * local attention * global attention * Encoder with all VGG16's layer * Use classification's result as decoder's input * Sementic attention model * adaptive attention model * Neural baby talk model * Helpful solution for my dirty model * return_sequence with timedistributed * h0 has feature map's information * can hidden state of RNN can be training or tuning * where should feature map be put * output should be probability distributed or word vector * My dirty model's architecture graph * [Trap may occur when using Keras](https://keras-cn.readthedocs.io/en/latest/for_beginners/trap/) ## 04/28 * [終於會說人話的model](https://colab.research.google.com/drive/1ogP1zSQn1XT2rqRzdBQK-_mVi8QGdkfZ?usp=sharing) * replace yield with return * Batch size = 500, epochs = 100, training time = 9~10 sec per epochs * Total training time = 20 min, loss = 0.2741 * Embedding layer的運作方式 * 有人在[Kaggle](https://www.kaggle.com/jerrykuo7727/embedding-rnn-0-876)上實作並解釋了 我覺得講得很清楚 建議大家都去看看 * [GloVe](https://nlp.stanford.edu/projects/glove/) ~~* LSTM input's shape 的意義 -> (batch size, time steps, input dim)~~ * tf.repeat() 用來重複讀資料 -> 需要用到 tf.data.Dataset * checkpoint * 有Attention的model * 如果我們要用return_state, batch size 不能是None 要給他值 * SparseCategoricalCrossEntropy 只需要 target label 是一個數字 * [Keras官方文件說明](https://keras.io/api/losses/probabilistic_losses/#sparsecategoricalcrossentropy-class) * y_pred = (None, 30, 7184), y_true = (None,30) -> 可以train但predict結果不太好 * debug 用的 train dataset size 先設由小而大有級距,等有確定再跑完整個 training set * 500 1000 2000 5000 10000 20000 30000 * 一次處理好多筆資料我在自己決定要取多少? 像是一次做好50000張我再用random自己取 * 使用 pretrain 好的 word vector 確實有助於減少訓練時間 * 實驗條件: 5000 caption, 100 epochs, with GPU * loss: sparse_categorical_crossentropy * 有效的將每個 epochs 的 training time 從 9s -> 2s * 最後的 loss 也比先前結果更低一點 : 0.1966 ## 05/05 * [實驗用的colab](https://colab.research.google.com/drive/1dPQIxa19rbMe_9NECUQBSteCRMt96Sr4) * word embbeding 的 unknown token 怎麼運作的 * [GloVe 沒有unknown token](https://stackoverflow.com/questions/49239941/what-is-unk-in-the-pretrained-glove-vector-files-e-g-glove-6b-50d-txt) * [如何處理oov(unknown token)](https://stackoverflow.com/questions/49346922/does-pre-trained-embedding-matrix-has-eos-unk-word-vector) 1. 忽略 2. 隨機給他一個固定的向量,例如 GloVe 就統一 oov token 為 n-dim 的 0 向量 3. 用 [fasttext](https://fasttext.cc/) 據說這個解決了oov的問題 4. GloVe的作者說他發現unknown token可以用你使用的所有word vector取平均來當作unknown token的word vector,這樣有不錯的performance >Jeffrey Pennington: ...I've found that just taking an average of all or a subset of the word vectors produces a good unknown vector. * baseline model 架構固定 * Glove word vector 有沒有對到我們自己的 tokenizer id 的正確的字 * 沒有在pretrained好的word vector中的詞,詞向量是100個0的詞向量 * **我統計的詞有多包含 "start" "end" "pad" 的 3個token** * 我用全部的val2017的詞下去統計,總共有7181個詞,其中有321個詞是沒有在pretrained好的word vector裡面 * 在val2017,出現頻率>10的詞總共有1534個,有4個詞是沒有在pretrained好的word vector裡面 * 在val2017,出現頻率>5的詞總共有2329個,有6個詞是沒有在pretrained好的word vector裡面 * 用全部的train2017的詞下去統計,總共有26851個詞,其中有4657個詞是沒有在pretrained好的word vector裡面 * 在train2017,出現頻率>10的詞總共有7412個,有62個詞是沒有在pretrained好的word vector裡面 * 在train2017,出現頻率>5的詞總共有10191個,有188個詞是沒有在pretrained好的word vector裡面 * **結論: 把出現次數少的詞扣掉,就可以減少 mapping 不到 word vector 的情況** * Encoder (14, 14, 512) 還要做 MaxPooling / AvgPooling? * 測試成果,其實我覺得好像沒什麼差別... * ![](https://i.imgur.com/NFRddYa.png) ``` ---------------------------- (1, 14, 14, 512)_origin: a man is boats a bat at a baseball game ---------------------------- (1, 14, 14, 512)_dropout_0.5: a young boy is playing baseball on a field ---------------------------- (1, 14, 14, 512)_dropout_0.3: a man in a baseball performing is boats a bat ---------------------------- (1, 7, 7, 512): a baseball player swinging a light at a baseball ---------------------------- (1, 3, 3, 512): a baseball player is holding a bat in his hand ---------------------------- (1, 512): a baseball player is wii ready to hit a ball ---------------------------- (1, 4096): a man in a baseball walls swinging a tree ---------------------------- ``` * 實驗結果 * 測試 LSTM recurrent_dropout,以多張圖的成果來看(人肉判別句子),我認為 不加>0.5>0.3 * 測試 feature maps 大小,以人類的角度看不出句子的好壞,可能需要加 BLEU 標準 * 但 feature maps 縮的越小,對顏色的判別似乎比較差,例如紅色衣服會判斷成藍色衣服、白色公車判別為紅色公車,但這可能需要再多幾張圖觀察看看 * dropout 的地方 ## 05/12 * oov_token and num_words in keras.Tokenizer * word_index 不論如何都會出現 * `出現頻率 < num_words` 才會被換成 `<unk>` * [num_words doesn't seem to work](https://stackoverflow.com/questions/46202519/keras-tokenizer-num-words-doesnt-seem-to-work) * `<unk>` 的 Embedding Matrix 是取亂數 * BLEU 量化模型好壞 * OSCAR * Visual * Ours * LSTM 裡面的 recurrent_dropout=0.2 試試看 * BLEU-4 * OSCAR: 41.7 * Visual attention: 25 * our seq2seq(4096): 6 * METEOR * OSCAR: 30.6 * Visual attention: 23.9 * our seq2seq(4096): 10.4 * 分工: * hsuanchia * ~~BLEU 使用pretrain好的val2017 feature map~~ (Finished) * Soft attention and visual attention * ~~畫出 visual attention 的 架構圖~~ (老師幫我畫好了QQ) * Code 實作 * 補上 unknown token 的 [code](https://colab.research.google.com/drive/1mbAl8OWgM8nNzfe5EzBDL1nKBDsciQoe#scrollTo=VFfCAWPQof01) debug * ~~oov word vector使用random(可參考廷郡的Code)~~ (finished) * ~~filter補上oov_token~~ (finished) * ~~用text_to_sequence來做caption的label~~ 有問題,請看05/19 * 家正 * Train seq2seq base model * encoder data: (7,7,512) * VOC 要補上 unknown token * 已做完,參考 [code](https://colab.research.google.com/drive/1mbAl8OWgM8nNzfe5EzBDL1nKBDsciQoe#scrollTo=VFfCAWPQof01) 可能還會有問題,有問題再回報給hsuanchia * TimeDistributed Dropout after LSTM 對於 predict 影響 * 在fit階段使用validation, 用val2017前2500張圖片 * 在BLEU算法debug好之後, 要用來算BLEU才知道我們有多爛, 用val2017後2500張圖片來算 * train data: output_7x7x512_train5000 * 100 ephoch * TimeDistributed(Dropout(0.5)), val predict 前 2669 個 ``` {'testlen': 30567, 'reflen': 28750, 'guess': [30567, 27898, 25229, 22560], 'correct': [16038, 6235, 2172, 828]} ratio: 1.063199999999963 Bleu_1: 0.525 Bleu_2: 0.342 Bleu_3: 0.216 Bleu_4: 0.139 METEOR: 0.169 ROUGE_L: 0.409 CIDEr: 0.352 ``` * no TimeDistributed dropout, val predict 2500 個 ``` {'testlen': 26580, 'reflen': 25466, 'guess': [26580, 24080, 21580, 19080], 'correct': [13558, 4649, 1529, 526]} ratio: 1.043744600643955 Bleu_1: 0.510 Bleu_2: 0.314 Bleu_3: 0.191 Bleu_4: 0.118 METEOR: 0.160 ROUGE_L: 0.389 CIDEr: 0.350 ``` * 柏瑋 * Train seq2seq base model * encoder data: (14,14,512) * VOC 要補上 unknown token * 已做完,參考 [code](https://colab.research.google.com/drive/1mbAl8OWgM8nNzfe5EzBDL1nKBDsciQoe#scrollTo=VFfCAWPQof01) 可能還會有問題,有問題再回報給hsuanchia * TimeDistributed Dropout after LSTM 對於 predict 影響 * 在fit階段使用validation, 用val2017前2500張圖片 * 在BLEU算法debug好之後, 要用來算BLEU才知道我們有多爛, 用val2017後2500張圖片來算 * 前2500,epochs=50,TimeDistributed Dropout=0.5,loss:0.5759 > Bleu_1: 0.508 Bleu_2: 0.327 Bleu_3: 0.204 Bleu_4: 0.130 METEOR: 0.162 ROUGE_L: 0.397 CIDEr: 0.319 * 前2500,epochs=100,TimeDistributed Dropout=0.2,loss:0.4575 >Bleu_1: 0.543 Bleu_2: 0.367 Bleu_3: 0.239 Bleu_4: 0.155 METEOR: 0.169 ROUGE_L: 0.422 CIDEr: 0.356 * 廷郡 * 把資料處理的code從我現在的code獨立出來 * 檔案大小 9.4 GB,讀檔進來需要花費 4.5 分鐘,可能需要斟酌 * 我自己跑過,認為在現階段 RAM 仍然夠用的情況下應該不需要額外處理,因讀取時間過慢 * TimeDistributed Dropout after LSTM 對於 predict 影響(從你現在的那個版本的model) * TimeDistrubuted dropout - 0.3 & recurrent dropout - 0.2 ``` Bleu_1: 0.564 Bleu_2: 0.369 Bleu_3: 0.234 Bleu_4: 0.149 METEOR: 0.170 ROUGE_L: 0.418 CIDEr: 0.435 ``` * TimeDistrubuted dropout - 0.0 & recurrent dropout - 0.2 ``` Bleu_1: 0.548 Bleu_2: 0.348 Bleu_3: 0.217 Bleu_4: 0.137 METEOR: 0.162 ROUGE_L: 0.402 CIDEr: 0.399 ``` * recurrent_dropout 要不要加,要加的話比例多少(0.2?) * 從 BLEU 來看, recurrent_dropout 必須加 * TimeDistrubuted dropout - 0.3 & recurrent dropout - 0.2 ``` Bleu_1: 0.564 Bleu_2: 0.369 Bleu_3: 0.234 Bleu_4: 0.149 METEOR: 0.170 ROUGE_L: 0.418 CIDEr: 0.435 ``` * TimeDistrubuted dropout - 0.3 & recurrent dropout - 0.0 ``` Bleu_1: 0.551 Bleu_2: 0.359 Bleu_3: 0.224 Bleu_4: 0.140 METEOR: 0.171 ROUGE_L: 0.418 CIDEr: 0.420 ``` * 還有一個 recurrent dropout=0.1 還沒算 BLEU :::warning 訓練好的model要記得在 [github](https://github.com/hsuanchia/Image-caption/blob/main/Models/models.md) 做紀錄,需要記什麼東西請點進去看 ::: ## 05/19 * keras tokenizer * Tokenzier 的參數 * num_words: tokenizer 會用出現頻率前n的詞彙 * oov_tokens: 設定oov的詞彙 * filter: 設定你tokenizer要濾掉的符號或字 * [設定num_words沒有用?](https://stackoverflow.com/questions/46202519/keras-tokenizer-num-words-doesnt-seem-to-work) 用tokenizer.word_index之後發現它顯示的還是全部的word * 原因是因為word_index跟index_word一定會紀錄全部的word * num_words只有在你fit sentence上去的時候才會有用,意思是只有tokenizer知道他要用那些詞 * Example code: ``` s = [['apple'],['apple'],['apple'],['apple'],['apple'],['banana'],['banana'],['banana'],['cat']] token = Tokenizer(num_words=4,filters='!"#$%&()*+,-./:;=?@[\\]^_`{|}~\t\n',oov_token='<unk>') token.fit_on_texts(s) print(token.word_index) print(len(token.word_index)) print(token.texts_to_sequences(['apple'])) print(token.texts_to_sequences(['banana'])) print(token.texts_to_sequences(['cat'])) ``` output: ``` {'<unk>': 1, 'apple': 2, 'banana': 3, 'cat': 4} 4 [[2]] #測試給tokenizer 'apple' 會轉成什麼 -> 出現頻率大於4,回傳index [[3]] #測試給tokenizer 'banana' 會轉成什麼 -> 同上 [[1]] #測試給tokenizer 'cat' 會轉成什麼 -> 出現頻率小於4,回傳oov_token ``` * 意思是如果我們想要用tokenizer做oov的話會有幾個困難: 1. User不知道實際上出現頻率前n的詞有哪些,除非自己算 2. 在不知道實際上用了哪些詞的情況下,使用 pretrain word vector 的時候,不知道要用哪些詞mapping pretrained word vector來build embedding layer,所以只能乾脆全部的word都用,但會造成dimension提升,增加訓練時間。 * 現在我們的oov解決方法: * 自己算要用出現頻率大於幾次的詞,自己建vocabulary。Training 時使用的vocabulary,[generate_caption](https://github.com/hsuanchia/Image-caption/blob/main/generate_caption.ipynb)時也要用一樣的。所以我有統一存在雲端,記錄在[github 的 model.md](https://github.com/hsuanchia/Image-caption/blob/main/Models/models.md) * oov_token還是可以靠tokenizer,但是oov_token的index,default為1,建議自己建的vocabulary跟tokenizer的oov_token的index一樣 * Training前要記得把每個caption中不在自己vocabulary中的詞替換成oov_token,這樣training時才會有oov_token * Mapping 不到 Pretrained 好的 word vector * oov_token取全部使用的word vector的平均 * 如果map不到的: * 全部用0向量 * 全部用隨機向量 * 感覺上用0向量的效果好一點,參考在[github 的 model.md](https://github.com/hsuanchia/Image-caption/blob/main/Models/models.md)上面的紀錄,但可能要算BLEU才知道具體差異 * 現在predict的時候句尾都會多predict出一個oov_token * timeDistributed(Dropout 0.3), recurrent_dropout = 0.2, * **Train model應該要一樣的東西(控制變因)** * Training data : MSCOCO Train2017 * feature map: 14 x 14 x 512 * word 只用出現頻率 >= 5的詞,詳情可看[code](https://colab.research.google.com/drive/1mbAl8OWgM8nNzfe5EzBDL1nKBDsciQoe#scrollTo=VFfCAWPQof01) * oov處理 * unk用所有使用的word vector取平均 * 其他mapping不到pretrain word vector的用隨機向量 * Training detail * epochs = 100 * EarlyStopping(patience=5,monitor='loss') * training 要用validation的話,用MSCOCO val2017 後2500張圖片 * BLEU * 用MSCOCO val2017 前2500張圖片算,看廷郡的實驗比較之後再決定要用1000張還是2500張 * 分工 * hsuanchia * Soft attention architecture(實際上怎麼做的,要包含各種shape) * Soft attention 的 code * How to recurrent by ourself using keras * Keras 有提供的attention Layer (用法有待確認) * Attention (Luong) * AdditiveAttention (Bahdanau) * MultiHeadAttention * 家正 * 統整所有model的BLEU值 * 固定的值(控制變因): * feature 14 x 14 x 512 * Voc: 使用全部Train2017統計, 只取出現頻率超過5的詞彙 * unk 的 word vector 使用所有有用到的 word vector 取平均 * 要變的(實驗變因) * 有沒有做 unk token * dropout 0 / 0.2 /0.5 * word vector mapping 不到的給隨機向量 / 0向量 * training epoch 50 / 100 * 把分數自動輸出到一個 csv 檔裡做整理,[code 寫在這裡](https://github.com/hsuanchia/Image-caption/commit/89ec87189b38b3bb8c7924673f54b0febf65494a#diff-dcef56fa3291f1858cb4cc148982171263c577108cb88dd7ba80bd1c7a0c4388),[csv 檔連結](https://github.com/hsuanchia/Image-caption/commit/89ec87189b38b3bb8c7924673f54b0febf65494a#diff-dcef56fa3291f1858cb4cc148982171263c577108cb88dd7ba80bd1c7a0c4388),[Google Sheet 方便手動修改](https://docs.google.com/spreadsheets/d/1MoNzQI0VE7oBT29gxyoqcdqsPgBeViDKvbk6hP6Z1mw/edit?usp=sharing) ~~* TODO training~~ * 柏瑋 * 加入validation in training,使用MSCOCO val2017後2500張圖片 * Train出來的話把model跟code分享至github吧! * 廷郡 * LSTM > GRU * 多層的LSTM * 一層 LSTM * TimeDistrubuted dropout - 0.3 * recurrent dropout - 0.2 ``` Epoch 22/100 782/782 [==============================] - 132s 168ms/step - loss: 0.9101 - sparse_categorical_accuracy: 0.7949 val_loss: 1.2389 - val_sparse_categorical_accuracy: 0.7716 Bleu_1: 0.571 Bleu_2: 0.376 Bleu_3: 0.238 Bleu_4: 0.152 METEOR: 0.168 ROUGE_L: 0.421 CIDEr: 0.430 ``` * 一層 LSTM * TimeDistrubuted dropout - 0.5 * recurrent dropout - 0.2 ``` Epoch 26/100 782/782 [==============================] - 135s 173ms/step loss: 0.9882 - sparse_categorical_accuracy: 0.7847 val_loss: 1.2259 - val_sparse_categorical_accuracy: 0.7734 Bleu_1: 0.584 Bleu_2: 0.389 Bleu_3: 0.250 Bleu_4: 0.162 METEOR: 0.172 ROUGE_L: 0.428 CIDEr: 0.465 ``` * 兩層 LSTM * TimeDistrubuted dropout - 0.3 * recurrent dropout - 0.2 * Dropout in LSTM 2 - 0.2 ``` Epoch 19/100 782/782 [==============================] - 191s 244ms/step - loss: 0.9554 - sparse_categorical_accuracy: 0.7878 val_loss: 1.2465 - val_sparse_categorical_accuracy: 0.7700 Bleu_1: 0.569 Bleu_2: 0.373 Bleu_3: 0.237 Bleu_4: 0.153 METEOR: 0.170 ROUGE_L: 0.424 CIDEr: 0.443 ``` * 多層的GRU * dropout rate: 0 ~ 0.5 * 算BLEU用val前1000跟前2500有沒有差別 * 差距似乎不大,誤差在 0.5% 以內 * 家正說他把load_model從generated_caption拔出去後,可節省大量時間,現在算BLEU用2500筆資料只要20分鐘,詳情看家正在github上的commit * predict句尾有unk的測試 ## 05/26 [Jiazheng's Presentaion](https://docs.google.com/presentation/d/1f9ltkirO2M5aORLefWFN2SyGAHg4i0e6smVJiqzx2jo/edit?usp=sharing) [歡迎把成績登錄上 Google Sheet 做整合](https://docs.google.com/spreadsheets/d/1MoNzQI0VE7oBT29gxyoqcdqsPgBeViDKvbk6hP6Z1mw/edit?usp=sharing),想加啥欄位自己加 * **Train model應該要一樣的東西(控制變因)** * Training data : MSCOCO Train2017 * feature map: 14 x 14 x 512 * word 只用出現頻率 >= 5的詞,詳情可看[code](https://colab.research.google.com/drive/1mbAl8OWgM8nNzfe5EzBDL1nKBDsciQoe#scrollTo=VFfCAWPQof01) * oov處理 * unk用所有使用的word vector取平均 * 其他mapping不到pretrain word vector的用隨機向量 * Training detail * epochs = 100 * EarlyStopping(patience=5,monitor='loss') * training 要用validation的話,用MSCOCO val2017 後2500張圖片 * BLEU * 用MSCOCO val2017 前2500張圖片算,看廷郡的實驗比較之後再決定要用1000張還是2500張 * seq2seq 發表會應討論的重點 * 我們調了哪些參數,從 baseline -> 加強版 -> 究級最強 * dropout rate (0~0.5) 比較 -> Dense layer * recurrent_dropout 0~0.3 * feature map: 14 x 14 Vs 7 x 7 * 加入validation後觀測其 epochs 數量 * Glove map不到的詞用 0 向量 或隨機向量 * LSTM VS GRU * 多層LSTM 跟 多層GRU * 幾個讓人有感覺的圖片+句子例子 * 我們的 BLEU 分數是啥意思 * 分工 * hsuanchia * ~~Soft attention architecture(實際上怎麼做的,要包含各種shape)~~ * Soft attention 的 code * How to recurrent by ourself using keras * Keras 有提供的attention Layer (用法有待確認) * Attention (Luong) * AdditiveAttention (Bahdanau) * MultiHeadAttention * 家正 * 幾個讓人有感覺的圖片+句子例子 -> 從前2500 val挑幾張 * baseline model 的BLEU值 * tensorflow 官方的image caption 的BLEU值 * 柏瑋 * 我們調了哪些參數,從 baseline -> 加強版 -> 究極最強 * dropout rate (0~0.5) 比較 -> Dense layer * recurrent_dropout 0~0.3 * feature map: 14 x 14 Vs 7 x 7 * 加入validation後觀測其 epochs 數量 * Glove map不到的詞用 0 向量 或隨機向量 * 廷郡 * predict句尾有unk的測試 * 多層的GRU * LSTM & GRU 的BLEU值的比較與統整 ## 06/02 * [偶然發現keras精簡版本的seq2seq image caption code(inception V3 + GRU)](https://github.com/HyunJu1/Image-Captioning/blob/master/Image%20Captioning.ipynb) * [Hsuanchia's attention base model](https://colab.research.google.com/drive/1WAFZ9DI-C-pInJxOZUy07pKYmyZY9aCQ#scrollTo=VbmwDu4gt4rX) * To-Do * Validation * 現在 epoch = 10, 可能要更多 * 確認我的model架構沒有架錯 * Attention on text(不確定有沒有效) * 算BLEU * Commen sense of every model * LSTM's Unit 512 * 14 x 14 x 512 feature map (by VGG16) * unknown token 用隨機向量 * mapping 不到的 跟special token用隨機向量 * early stop : patient = 5, validation loss * validation : MSCOCO val2017 後2500張圖片 * BLEU : MSCOCO val2017 前2500張圖片 * 統計與評估的Model * Base seq2seq * seq2seq + Dropout * Dense Dropout : 0.5 * Recurrent dropout: 0.35 * seq2seq + Dropout + 2 * LSTM * Dense Dropout : 0.5 * Layer 2 LSTM's input dropout : 0.3 * Layer 1 LSTM's recurrent dropout: 0.35 * Layer 2 LSTM's recurrent dropout: 0.4 * Attention based * 分工 * Hsuanchia * Attention 架構圖 * Attention * 家正 * 畫seq2seq based model架構圖 * 用 Diagrams.net 畫的,[原始檔](https://drive.google.com/file/d/19aDBeY5wHcv5LLZWEvvbednMbgeQNfue/view?usp=sharing) ![](https://i.imgur.com/6ppv5w5.png) * 選出可比較Model差異的幾張圖片 * [Google Slide](https://docs.google.com/presentation/d/1w20eA2RlAQSSzIazRwXxRtN2srVpOy9UNH_qr8FN17s/edit?usp=sharing),[Source Code](https://colab.research.google.com/drive/1g5eSdKTzLs2xA7_Gcn-6yzx5ooaLuUtU?usp=sharing) * 柏瑋 * 畫seq2seq based model架構圖 * 算Models的BLEU值 * 廷郡 * 畫seq2seq based model架構圖 * ![](https://i.imgur.com/1kj8Qow.jpg) * Training Models * model 1 ``` Bleu_1: 0.564 Bleu_2: 0.369 Bleu_3: 0.233 Bleu_4: 0.147 METEOR: 0.168 ROUGE_L: 0.418 CIDEr: 0.422 ``` * model 2 ``` Bleu_1: 0.585 Bleu_2: 0.389 Bleu_3: 0.250 Bleu_4: 0.161 METEOR: 0.172 ROUGE_L: 0.428 CIDEr: 0.457 ``` * model 3 ``` Bleu_1: 0.591 Bleu_2: 0.396 Bleu_3: 0.257 Bleu_4: 0.167 METEOR: 0.176 ROUGE_L: 0.434 CIDEr: 0.475 ``` ## 06/05 * 比較各 model training loss,看看 dropout 對 overfitting 有無幫助 * train loss / accuracy vs. validation loss / accruacy * 畫長條圖 * number of parameters * Model 取名 * 報告投影片初稿 * 解釋我們目標 [name=Jiazheng] * Visual Attention paper Fig. 1 * 用了什麼data來達到這件事 * dataset & preprocess [name=Hsuanchia] * model 架構圖 * 效能比較 * BLEU 解釋 [name=Jiazheng] * loss * parameter * Image samples [name=Jiazheng] * 349860 滑板 * 152214 人拿熱狗笑 * 329319 黑白貓 * Commen sense of every model * LSTM's Unit 512 * 14 x 14 x 512 feature map (by VGG16) * mapping 不到的 跟 special token 都用隨機向量 * early stop : patient = 5, validation loss * validation : MSCOCO val2017 後2000張圖片 * BLEU : MSCOCO val2017 前2500張圖片 * 統計與評估的Model * Base seq2seq * seq2seq + Dropout * Dense Dropout : 0.5 * Recurrent dropout: 0.35 * seq2seq + Dropout + 2 * LSTM * Dense Dropout : 0.5 * Layer 2 LSTM's input dropout : 0.3 * Layer 1 LSTM's recurrent dropout: 0.35 * Layer 2 LSTM's recurrent dropout: 0.4 * Attention based * 分工 * Hsuanchia * 解決Attention overfitting問題 * Attention 的相關數據 * Data preprocess了那些(報告上要用的PPT) * Future work(報告上要用的PPT) * 家正 * BLEU解釋與統計 * Image samples * 總目標是什麼的解釋 * 廷郡 * Seq2seq Model架構圖修正 * loss 的統計與效能比較 * parameter的統計 ## 06/09 [Merge 起來的草稿](https://docs.google.com/presentation/d/1dl2s9eeJqADNhH4a4IaeVzwwfPHM2lwrHGY_jO9c5w4/edit?usp=sharing) > 今天我生日喔喔 [name=Jiazheng] > 生日快樂啦XD [name=Hsuanchia] 圖片和字能大盡量大 第一張和第二張示範圖可以不一樣,避免被答案嫌疑 Motivation reference 別換行 Performance metric Seq2seq Model model架構圖風格明顯不同 後面performance比較的圖表 Example 縮排 文字preprocess成同一風格 大小寫 標點符號 若 att 出來沒比較好,多個探討 model 17 epochs, train loss: 1.0473, val loss: 1.9238 model 8 -> dropout = 0.25 100 epochs stop at 16 -> train loss: 0.9933, val loss 1.6741 model 9 -> dropout = 0.2 100 epochs stop at 21 -> train loss: 1.0415, val loss 1.5515 model 10 -> No dropout 100 epochs stop at 11 -> train loss: 1.0055, val loss 1.4646 ## 06/12 * Transfer Learning with GloVe Word Vectors * Attention 比我們 Seq2seq 還差的可能原因 * loss 不穩定?並沒有 * Tensorflow tutorial 也只有 0.4,Attention 天生缺陷? * 查出他們參數量,比較看看 * Overfitting? * Dropout * 參數量 只有 seq2seq 的 1/8 Underfitting * Seq2seq 有超多 MLP 參數,才降維成 512 的 representation * Attention 好像是取平均 * Attention * 句尾unk的問題 * 我在寫NLP作業的時候發現是我的資料處理方式出了問題,導致在每一個句子的後面都會多一個unk,將這個bug修復後,已正常。 * 改成flatten之後的參數量: 111,353,418 * Model 11 * h0, c0 用不同的MLP * stop at 13 epochs loss: 0.9696, val_loss: 1.4313 * 450s per epochs * BLEU-1: 0.409 * BLEU-2: 0.250 * BLEU-3: 0.143 * BLEU-4: 0.082 * METEOR: 0.130 * ROUGR_L: 0.359 * CIDEr: 0.182 * 改成flatten之後的參數量: 59,972,682 * Model 12 * h0, c0 用相同的MLP * stop at 14 epochs loss: 1.0098, val_loss: 1.3897 * 334s per epochs * BLEU-1: 0.419 * BLEU-2: 0.262 * BLEU-3: 0.154 * BLEU-4: 0.091 * METEOR: 0.134 * ROUGR_L: 0.364 * CIDEr: 0.201 * 上面這個加上dropout(0.5)在flatten之後 * Model 13 * stop at 12 epochs loss: 0.9766, val_loss: 1.3585 * BLEU-1: **0.520** * BLEU-2: 0.327 * BLEU-3: 0.196 * BLEU-4: 0.116 * METEOR: 0.144 * ROUGR_L: 0.393 * CIDEr: 0.314 * 上面這個加上dropout(0.35)在flatten之後 * Model 14 * stop at 11 epochs loss: 0.9676, val_loss: 1.3771 * BLEU-1: 0.466 * BLEU-2: 0.293 * BLEU-3: 0.173 * BLEU-4: 0.102 * METEOR: 0.140 * ROUGR_L: 0.384 * CIDEr: 0.290 * Dropout(0.5) + 2 * LSTM * val = 1500 * Model 15 * stop at 12 epochs loss: 0.8893, val_loss: 1.3073 * Bleu_1: **0.582** * Bleu_2: 0.379 * Bleu_3: 0.236 * Bleu_4: 0.146 * METEOR: 0.163 * ROUGE_L: 0.419 * CIDEr: 0.394 * Dropout(0.5) + 2 * LSTM + recurrent_dropout(0.5) * val = 1500 * Model 16 * stop at 13 epochs loss: 0.9443 , val_loss: 1.2696 * 512s per epochs * Bleu_1: **0.585** * Bleu_2: 0.387 * Bleu_3: 0.244 * Bleu_4: 0.153 * METEOR: 0.167 * ROUGE_L: 0.426 * CIDEr: 0.419 * Dropout(0.5) + 2 * LSTM + recurrent_dropout(0.35) * val = 1500 * Model 17 * stop at 12 epochs loss: 0.8960 , val_loss: 1.2970 * Bleu_1: 0.568 * Bleu_2: 0.370 * Bleu_3: 0.233 * Bleu_4: 0.146 * METEOR: 0.163 * ROUGE_L: 0.416 * CIDEr: 0.386 ## 06/16 專題成果發表 * Attention * Add dropout at alignment model * 可能是資料量不足(用model 3 跟attention比) * Model 3 是seq2seq的極限 他的overfitting的情況是最好的了 * TF tutorial 跟 visual attention 都沒有比我們最好的attention還要好,(因為c0,h0) * 期末專題發表簡報 [Google Slide](https://docs.google.com/presentation/d/1dl2s9eeJqADNhH4a4IaeVzwwfPHM2lwrHGY_jO9c5w4/edit?usp=sharing),[PDF 檔](https://drive.google.com/file/d/1eI3t05SpUrTtDlQ94pobCdD8UuEpx8Hh/view?usp=sharing) * [錄影檔](https://drive.google.com/file/d/1osgSvN9IIbs31APkb0oRUdwxVAYvfe1_/view?usp=sharing) (用我們四個人的學校帳號登入才看得到) ## 06/19 Deep Learing 報告 * [Project report Google Docs](https://docs.google.com/document/d/1k6rTLNaDQIu1Il3XhslxpYGg3vfl5xMJaGUUSD84vZU/edit?usp=sharing) * [深度學習課程的的協作共筆](https://hackmd.io/jxxYCiGzQeeE4Jpd9cYH_g) ## 10/11 * [本專題中英文摘要](https://hackmd.io/@hsuanchia/project-abstract)