# RNN 待研究
## 10/21
RNN 演變成 LSTM 的理由(RNN 哪裡不好)
back propogation through time 詳細運作
數學式
LSTM 又為啥演變成 GRU,哪裡比較好
做 peephole connection 理由,適用哪些時候
LSTM vs GRU 表格
實作程式調參
LSTM unit 50是啥
## 10/28
### Hsuanchia:
GRU
GRU vs LSTM vs RNN-Simple
### Jiazheng:
Batch Normalization / Layer Normalization
Textbook Chapter / Codes: Forecasting a Time Series
WaveNet (Optional)
[today's ppt](https://drive.google.com/file/d/1Jso9x9biDEakXGnl-fzToFcMSLNBWehF/view?usp=sharing)
[Chatq](https://ckip.iis.sinica.edu.tw/project/chatq/)
## 11/04
Jiazheng:
[本次簡報](https://docs.google.com/presentation/d/1NBVktVukT6vEzW0261buU9pRpCOLDelOnlRLdhAgjPw/edit?usp=sharing)
- return_sequence=True和預設False的差別?
如果False是不是20個time-series的跑完再丟給下個Layer嗎?
如果True的話是不是前20個timestamp的state都有傳達給下一Layer?
以上圖解
算參數看有無差
Discover:
- multi-layer RNN cannot run without return_sequence
- TimeDistributed Layer 之後會很常用到
Hsuanchia:
GRU: reset_after real meaning(why 60 more parameter than reset_before?) and its topology
[today's ppt](https://drive.google.com/file/d/1z90awR8zdMzeyDb8A2kOHBRNBb0lBWaI/view?usp=sharing)
## 11/11
[本次簡報](https://docs.google.com/presentation/d/1NBVktVukT6vEzW0261buU9pRpCOLDelOnlRLdhAgjPw/edit?usp=sharing)(直接拿11/04的改)
讀paper
[Kelvin Xu et al., “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention,”Proceedings of the 32nd International Conference on Machine Learning (2015): 2048–2057.](https://arxiv.org/pdf/1502.03044.pdf)
分頭研究LSTM/Attention和CNN部份
## 11/18
上面那篇的attention是2014年之前的,attention is all you need 2017比他晚,有不一樣?
## 11/25
Attention先備知識:
1. seq2seq: created shakespearean with character RNN(textbook)
* word embedding v.s. one-hot vector
* How's tokenization work?
* Stateful RNN
2. sentiment RNN(textbook)
* embedded layer
3. Bidirection RNN
4. Beam search algorithm
5. Attention
* soft, hard, global, local
* contex vector
6. paper: Visual attention(textbook)
7. paper: Attention is all you need
Jiazheng: 1 2
Hsuanchia: 3 4 5 6
optional: 7
## 12/02
定題目:12/08 before ML classes
發表會:12/23 or 12/25
judge: 陳履恆、陳依蓉、黃育銘
Ref:
[Standford ML project](http://cs229.stanford.edu/projects.html)
Both:
See if there is anything we can use on stanford's project
Find training data that we can use
Imagine what the CNN and RNN architecture suppose to be
Jiazheng:
last week's todo 3~6
Hsuanchia:
12/23 on stage presentation(10 min)
## 12/09
Jiazheng: visual attention's experiment(decorder) by using GRU model + soft attention
Hsuanchia: visual attention's experiment(decorder) by using LSTM model + soft attention
Presentation's date change to 2021/01/06 or 08
optional: BLEU evaluation
[Experiment by using tensorflow](https://github.com/yunjey/show-attend-and-tell)
## 12/16
Jiazheng:
Textbook Sentiment Analysis
How paper's LSTM hidden state(means probability?) turned into words and sentences
:::warning
Deadline: 12/21 should told me what you have done or learned
:::
Hsuanchia:
Visual attention's experiment
Presentation keypoint: Towards the goal, what's most related?
## 12/23
Word bank: build the word bank of whole 118k image * 5 annotation sentence
before word bank we need to do word embedded
LSTM: 196 * 512 input
It shall be 512 timesteps
[Experiment by using Keras](https://github.com/zimmerrol/show-attend-and-tell-keras)
## 12/30 Presentation Rehearsal
每個timestamp focus 在一個 word?
Validation dataset有沒有幫我們做stratified sampling?
Input image resolution? there's preprocessing
MS COCO is a abbr, full name?
VGGNet Filter size? Stride? 是到哪層停住? Pooling和Convolution的全部參數
Project Structure示意圖更low level,像課本Figure 16-8
Stateful or Stateless? 五句句子之間要有關聯
P.11公式表現方法
Portable / Capability
How's our project differs from refference paper? Performance Enhancemant / use YOLO instead of VGG
未來角度:新架構,encoder就已經是classification完,例如圖片中有哪些物件,multi-label multi-class;decoder改成GRU或Bidirectional(?)+Attention或deep RNN
Transfer Learning: we use VGGNet pretrained on ImageNet(object detection), use COCO to train more
Bi-directional cannot be used on image encoder CNN, how about decoder RNN?
2021/01/04 20:00 Meet again:
1. 大家看簡報哪裡還能再改
2. Hsuanchia 計時講一次
3. 猜評審會問啥問題
4. (Optional) 其他人用自己方式講一次
## 2021/01/05
用學校Google帳號才能看到
[雅婷逐字稿文字檔](https://drive.google.com/file/d/1z65L5HKzvod1eMm58cTMlmhVi-amT9XA/view?usp=sharing "Google Drive NCNU")
[錄音檔](https://drive.google.com/file/d/1z_CIjLq3Gyb1HtvpXJx8xOzsToU7BmMN/view?usp=sharing "Google Drive NCNU")
## 01/06
老師回饋
履恆說可以去看看NLP language model
履恆說他有看到引用這篇visual attention的paper,在自動生成詩句,或許我們也能看看後來人怎麼應用這篇paper
依蓉說YOLO看我們要整個架構引用,還是用一部分
attention機制不知道在attention三小,這件事我們到底需不需要知道
Jiazheng: 想分工
* [A Unified Architecture for Natural Language Processing:
Deep Neural Networks with Multitask Learning](https://dl.acm.org/doi/pdf/10.1145/1390156.1390177)
* [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/pdf/1801.06146.pdf)
* [CSPNet](https://arxiv.org/pdf/1911.11929.pdf)
* [image caption with full VGGNet](https://arxiv.org/pdf/1411.4555.pdf)
## 01/13
* image caption
* [Generate poetry from image](https://arxiv.org/pdf/1804.08473.pdf)
* conference (CVGIP?)
* NLP language model
* [100 must-read NLP paper](https://github.com/mhagiwara/100-nlp-papers)
* Decoder implementation
* YOLO
## 01/26
* show-and-tell implementations
* [Tenserflow version with pre-train model(use MSCOCO 2014 dataset)](https://github.com/coldmanck/show-attend-and-tell)
* [Keras version](https://github.com/201528014227051/show_attend_and_tell.keras)
* [Keras version](https://github.com/zimmerrol/show-attend-and-tell-keras)
* [Tensorflow ver.](https://github.com/tensorflow/models/tree/archive/research/im2txt) by Google employees, use Inception v3 encoder
* [PyTorch ver.](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning)
## 01/27
新聞:[Image caption 在 Facebook 上的應用](https://chinese.engadget.com/facebook-explains-ai-alt-text-for-photo-descriptions-070013576.html),英文原文: [How Facebook is using AI to improve photo descriptions for people who are blind or visually impaired](https://ai.facebook.com/blog/how-facebook-is-using-ai-to-improve-photo-descriptions-for-people-who-are-blind-or-visually-impaired/ "How Facebook is using AI to improve photo descriptions for people who are blind or visually impaired"),在Firefox瀏覽器到FB首頁->按右鍵->檢測輔助功能環境屬性(Inspect Accessibility Properties),在動態流上圖片停留就能看到FB幫圖片加的註解了
#### 跑 [pretrained model tensorflow ver.](https://github.com/coldmanck/show-attend-and-tell) 實驗方法 by Jiazheng:
根據 Git repo history 時間推斷,當時作者是用 Tensorflow 1.8.0 左右版本寫的,而現在已經 Tensorflow 2.x 版了會太新;又 [Tensorflow 1.8.0 的 pip](https://pypi.org/project/tensorflow/1.8.0/#files) 只有支援到 Python 3.6,而現在 Ubuntu 20.04 內建是 3.8 會太新,需要架個舊版的環境。
1. 安裝 Conda,一個方便切換和安裝不同版本的 Python 和相關套件的工具
官方教學: https://docs.anaconda.com/anaconda/install/linux/
2. 如果照預設的裝完,每次新開 terminal 都會先進它的 base 環境,要拒絕這件事的話
`conda config --set auto_activate_base false`
3. 建立新環境,`new_env` 是名字能自己取,並指定舊版 Python 版本
`conda create -n new_env python=3.6`
4. 進入自己建的環境,command line 每一行一開頭應該會多個`(new_env)`
`conda activate new_env`
5. 安裝指定版本的 Tensorflow、Numpy
`pip install tensorflow==1.8.0`
`pip install numpy==1.16.2 `
6. 安裝其他需要的函式庫
```zsh
pip install scipy matplotlib ipython \
jupyter pandas sympy nose opencv-python nltk pandas \
tqdm scikit-image
```
7. 把 Git repo README 的 pretrained model with default configuration can be downloaded here 下載下來的解壓縮,289999.npy 檔案放到 models 資料夾裡、vocabulary.csv 放在程式主目錄
8. 把你想實驗的 .jpg 圖片放進`test/images`
9. 跑 inference 實驗
```bash
python main.py --phase=test \
--model_file='./models/289999.npy' \
--beam_size=3
```
9. 有報錯`SyntaxError: Missing parentheses in call to 'print'.`的話,去`utils/coco/coco.py` 把每個有 print 的地方加上左右括弧,[像這樣](https://termbin.com/9lpn)
10. 結果會輸出在`test/results`
11. 離開 Conda 所建環境的方法,或也可以關掉這視窗再打開新的啦
`conda deactivate`
## 01/28
* Encoder
1. CSPnet
2. Classification in encoder
3. stuff segmentation
* Decoder
* 想辦法重現現在paper的那個decoder再來談
* 之後設計可以配合其他種encoder的decoder, 像classify 之後再丟進decoder做字句生成
* All need to do: Image caption's paper survey
* NLP word vector pre-train model: [GloVe](https://nlp.stanford.edu/projects/glove/)
* [Using pre-trained word embeddings in a Keras model](https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html)
## 02/10
* Encoder
1. CSPnet
2. Classification in encoder
3. stuff segmentation
* Decoder
1. 解決 decoder overfitting
2. performance 評估
3. 準確率提升
* All need to do: Image caption's paper survey
* [hsuanchia's decoder](https://colab.research.google.com/drive/1wWPK5wWA4jSW97xu0Un7ohLdeO0wJUIy#scrollTo=Cbf2FTiFFLpX)
## 02/21
* Hsuanchia
* [My decoder](https://colab.research.google.com/drive/1wWPK5wWA4jSW97xu0Un7ohLdeO0wJUIy#scrollTo=Cbf2FTiFFLpX)
* decoder with attention
* [tensorflow's image caption](https://www.tensorflow.org/tutorials/text/image_captioning#model)
* decoder without attention
* [What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?](https://arxiv.org/abs/1708.02043)
* [Paper's code](https://github.com/mtanti/rnn-role)

* image caption's paper survey
* [Generate poetry from image](https://arxiv.org/pdf/1804.08473.pdf) (not finish)
* [Image Captioning with Semantic Attention](https://arxiv.org/pdf/1603.03925.pdf) (not finish)
* [Paper description (chinese)](https://blog.csdn.net/sinat_26253653/article/details/78260985)
* [Image caption development (chinese)](https://zhuanlan.zhihu.com/p/30314440)
* Topic need to keep survey
* Attention mechanism
* Is it really work to the caption?
* [Adaptive attention](https://arxiv.org/pdf/1612.01887)
* [github](https://github.com/jiasenlu/AdaptiveAttention)
* Beam search
* To improve the performance
* BLEU and METEOR
* Or something which can judge model's prediction
* [State-of-the-art of image caption](https://paperswithcode.com/task/image-captioning)
* [OSCAR - 2020's state-of-the-art](https://github.com/microsoft/Oscar)
* [Azure Florence – Vision and Language ](https://www.microsoft.com/en-us/research/project/azure-florence-vision-and-language/)
* [Unified VLP - 2019's state-of-the-art](https://arxiv.org/pdf/1909.11059v3.pdf)
* [Adative attention](https://arxiv.org/pdf/1612.01887.pdf)
* [What Value Do Explicit High Level Concepts Have in Vision to Language Problems?](https://arxiv.org/pdf/1506.01144.pdf)
* [Image Captioning with Semantic Attention](https://arxiv.org/pdf/1603.03925.pdf)
* [show-and-tell - 2015's state-of-the-art](https://arxiv.org/pdf/1502.03044.pdf)
* [Stanford NLP group](https://nlp.stanford.edu/software/)
* [MSCOCO caption evaluation tool](https://github.com/tylin/coco-caption)
* [Image caption's paper](https://blog.csdn.net/JohnChen45/article/details/81748651)
## 02/24
* Encoder
* YOLOv1 ~ YOLOv3
* implement YOLOv3
* [Adative attention](https://arxiv.org/pdf/1612.01887.pdf)
* [github](https://github.com/jiasenlu/AdaptiveAttention)
* [Paper description (Chinese)](https://blog.csdn.net/sinat_26253653/article/details/79416234)
* Decoder
* [What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?](https://arxiv.org/abs/1708.02043)
* [Paper's code](https://github.com/mtanti/rnn-role)
* [Image Captioning with Semantic Attention](https://arxiv.org/pdf/1603.03925.pdf) (not finish)
* [Paper description (Chinese)](https://blog.csdn.net/sinat_26253653/article/details/78260985)
* [Adative attention](https://arxiv.org/pdf/1612.01887.pdf)
* [github](https://github.com/jiasenlu/AdaptiveAttention)
* [Paper description (Chinese)](https://blog.csdn.net/sinat_26253653/article/details/79416234)
* [What Value Do Explicit High Level Concepts Have in Vision to Language Problems?](https://arxiv.org/pdf/1506.01144.pdf)
* [image caption's important paper (chinese)](https://blog.csdn.net/sinat_35177634/article/details/88102512)
* VQA (Visual Question Answering)
## 03/03
* Encoder
* FCN
* YOLO
* YOLO pre-trained model
* Decoder
* Template-based paper
* FCN
* [OSCAR - 2020's state-of-the-art](https://arxiv.org/pdf/2004.06165v5.pdf)
* [Paper description (Chinese)](https://blog.csdn.net/c9Yv2cf9I06K2A9E/article/details/106270568?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522161509995516780264054709%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=161509995516780264054709&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~baidu_landing_v2~default-1-106270568.pc_search_result_before_js&utm_term=oscar+image+caption)
* [Neural baby talk](https://arxiv.org/pdf/1803.09845.pdf)
* [Pointer Network (video)](https://www.youtube.com/watch?v=VdOyqNQ9aww)
* [R-CNN (chinese)](https://ivan-eng-murmur.medium.com/object-detection-s1-rcnn-%E7%B0%A1%E4%BB%8B-30091ca8ef36)
* [Fast R-CNN (chinese)](https://ivan-eng-murmur.medium.com/obeject-detection-s2-fast-rcnn-%E7%B0%A1%E4%BB%8B-40cfe7b5f605)
* [Faster R-CNN (chinese)](https://ivan-eng-murmur.medium.com/object-detection-s3-faster-rcnn-%E7%B0%A1%E4%BB%8B-5f37b13ccdd2)
* Start some experiment
## 03/10
* Encoder
* YOLO
* R-CNN
* Decoder
* Adaptive Loss (hsuanchia)
* Neural Baby Talk (jiazheng)
* archtecture
* loss
* [Other's PPT](https://www.cs.ubc.ca/~lsigal/532S_2018W2/1a.pdf)
* Show-and-tell code(hsuanchia)
## 03/17
* Encoder
* Fast RCNN
* YOLOv2
* Decoder
* Neural Baby Talk (jiazheng)
* archtecture
* loss
* [Other's PPT](https://www.cs.ubc.ca/~lsigal/532S_2018W2/1a.pdf)
* [coco-caption](https://github.com/tylin/coco-caption)
* Show-and-tell code(hsuanchia)
## 03/24
* Encoder
* YOLOv3
* YOLO9000
* Decoder
* NBT & Adaptive & Show-and-tell loss survey(hsuanchia)
* Show-and-tell code(hsuanchia)
* VQA survey(hsuanchia)
* [introdution(Chinese)](https://franky07724-57962.medium.com/%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92%E5%92%8C%E5%9C%96%E5%83%8F%E5%95%8F%E7%AD%94-aee3730d9fbc)
* [CLEVR dataset](https://cs.stanford.edu/people/jcjohns/clevr/)
* [DAQUAR dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/vision-and-language/visual-turing-challenge/)
* [COCO-QA](http://www.cs.toronto.edu/~mren/research/imageqa/data/cocoqa/)
* [VQA dataset](https://visualqa.org/index.html)
* RNN-injection survey(hsuanchia)
* [coco-caption](https://github.com/tylin/coco-caption)(jiazheng)
* CIDER, Meteor, SPICE (jiazheng)
* [CIDEr](https://arxiv.org/pdf/1411.5726.pdf) and [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)
* [METEOR Wikipedia](https://en.wikipedia.org/wiki/METEOR)
* [Image Caption 常用评价指标 簡書](https://www.jianshu.com/p/60deff0f64e1)
* [[Image Caption学习]NLP常见评价指标 知乎](https://zhuanlan.zhihu.com/p/160988416)
* [Towards image captioning and evaluation](https://www.cs.princeton.edu/courses/archive/spring18/cos598B/public/outline/Towards%20image%20captioning.pdf)
* 從[這裡](https://blog.csdn.net/u013548453/article/details/79244007)發現了 Stanford 有一門課[作業](https://cs231n.github.io/assignments2016/assignment3/)就在做 Image Caption
## 04/07
[Evaluating Image Caption Presentation](https://docs.google.com/presentation/d/1ZLT7pJTuvB6q0w2M-wxKOvlV-HVmW6mQXaBtirDjT7M/edit?usp=sharing)
Language model has huge progress, loss: 30000 -> 3, by tuning step_per_epoch.
* To-do coding
* [CNN + RNN model(No attention)](https://colab.research.google.com/drive/12mxdrrN3oKeOQn7ml8Fwx3CurQaFtdYy)
* add and concatenate
* injection model
* merge model
* epochs tuning
* LSTM, GRU, simpleRNN
* Change CNN's arch
* Optimal loss function
* checkpoint
* visual attention(show-and-tell)
* soft attention
* local attention
* global attention
* Encoder with all VGG16's layer
* Use classification's result as decoder's input
* Sementic attention model
* adaptive attention model
* Neural baby talk model
* research
* loss function for every model we have discuss before
## 04/21
* Coding!!!
* [CNN + RNN model(No attention)](https://colab.research.google.com/drive/12mxdrrN3oKeOQn7ml8Fwx3CurQaFtdYy)
* add and concatenate
* injection model
* merge model
* epochs tuning
* LSTM, GRU, simpleRNN
* Change CNN's arch
* Optimal loss function
* checkpoint
* visual attention(show-and-tell)
* soft attention
* local attention
* global attention
* Encoder with all VGG16's layer
* Use classification's result as decoder's input
* Sementic attention model
* adaptive attention model
* Neural baby talk model
* Helpful solution for my dirty model
* return_sequence with timedistributed
* h0 has feature map's information
* can hidden state of RNN can be training or tuning
* where should feature map be put
* output should be probability distributed or word vector
* My dirty model's architecture graph
* [Trap may occur when using Keras](https://keras-cn.readthedocs.io/en/latest/for_beginners/trap/)
## 04/28
* [終於會說人話的model](https://colab.research.google.com/drive/1ogP1zSQn1XT2rqRzdBQK-_mVi8QGdkfZ?usp=sharing)
* replace yield with return
* Batch size = 500, epochs = 100, training time = 9~10 sec per epochs
* Total training time = 20 min, loss = 0.2741
* Embedding layer的運作方式
* 有人在[Kaggle](https://www.kaggle.com/jerrykuo7727/embedding-rnn-0-876)上實作並解釋了 我覺得講得很清楚 建議大家都去看看
* [GloVe](https://nlp.stanford.edu/projects/glove/)
~~* LSTM input's shape 的意義 -> (batch size, time steps, input dim)~~
* tf.repeat() 用來重複讀資料 -> 需要用到 tf.data.Dataset
* checkpoint
* 有Attention的model
* 如果我們要用return_state, batch size 不能是None 要給他值
* SparseCategoricalCrossEntropy 只需要 target label 是一個數字
* [Keras官方文件說明](https://keras.io/api/losses/probabilistic_losses/#sparsecategoricalcrossentropy-class)
* y_pred = (None, 30, 7184), y_true = (None,30) -> 可以train但predict結果不太好
* debug 用的 train dataset size 先設由小而大有級距,等有確定再跑完整個 training set
* 500 1000 2000 5000 10000 20000 30000
* 一次處理好多筆資料我在自己決定要取多少? 像是一次做好50000張我再用random自己取
* 使用 pretrain 好的 word vector 確實有助於減少訓練時間
* 實驗條件: 5000 caption, 100 epochs, with GPU
* loss: sparse_categorical_crossentropy
* 有效的將每個 epochs 的 training time 從 9s -> 2s
* 最後的 loss 也比先前結果更低一點 : 0.1966
## 05/05
* [實驗用的colab](https://colab.research.google.com/drive/1dPQIxa19rbMe_9NECUQBSteCRMt96Sr4)
* word embbeding 的 unknown token 怎麼運作的
* [GloVe 沒有unknown token](https://stackoverflow.com/questions/49239941/what-is-unk-in-the-pretrained-glove-vector-files-e-g-glove-6b-50d-txt)
* [如何處理oov(unknown token)](https://stackoverflow.com/questions/49346922/does-pre-trained-embedding-matrix-has-eos-unk-word-vector)
1. 忽略
2. 隨機給他一個固定的向量,例如 GloVe 就統一 oov token 為 n-dim 的 0 向量
3. 用 [fasttext](https://fasttext.cc/) 據說這個解決了oov的問題
4. GloVe的作者說他發現unknown token可以用你使用的所有word vector取平均來當作unknown token的word vector,這樣有不錯的performance
>Jeffrey Pennington: ...I've found that just taking an average of all or a subset of the word vectors produces a good unknown vector.
* baseline model 架構固定
* Glove word vector 有沒有對到我們自己的 tokenizer id 的正確的字
* 沒有在pretrained好的word vector中的詞,詞向量是100個0的詞向量
* **我統計的詞有多包含 "start" "end" "pad" 的 3個token**
* 我用全部的val2017的詞下去統計,總共有7181個詞,其中有321個詞是沒有在pretrained好的word vector裡面
* 在val2017,出現頻率>10的詞總共有1534個,有4個詞是沒有在pretrained好的word vector裡面
* 在val2017,出現頻率>5的詞總共有2329個,有6個詞是沒有在pretrained好的word vector裡面
* 用全部的train2017的詞下去統計,總共有26851個詞,其中有4657個詞是沒有在pretrained好的word vector裡面
* 在train2017,出現頻率>10的詞總共有7412個,有62個詞是沒有在pretrained好的word vector裡面
* 在train2017,出現頻率>5的詞總共有10191個,有188個詞是沒有在pretrained好的word vector裡面
* **結論: 把出現次數少的詞扣掉,就可以減少 mapping 不到 word vector 的情況**
* Encoder (14, 14, 512) 還要做 MaxPooling / AvgPooling?
* 測試成果,其實我覺得好像沒什麼差別...
* 
```
----------------------------
(1, 14, 14, 512)_origin:
a man is boats a bat at a baseball game
----------------------------
(1, 14, 14, 512)_dropout_0.5:
a young boy is playing baseball on a field
----------------------------
(1, 14, 14, 512)_dropout_0.3:
a man in a baseball performing is boats a bat
----------------------------
(1, 7, 7, 512):
a baseball player swinging a light at a baseball
----------------------------
(1, 3, 3, 512):
a baseball player is holding a bat in his hand
----------------------------
(1, 512):
a baseball player is wii ready to hit a ball
----------------------------
(1, 4096):
a man in a baseball walls swinging a tree
----------------------------
```
* 實驗結果
* 測試 LSTM recurrent_dropout,以多張圖的成果來看(人肉判別句子),我認為 不加>0.5>0.3
* 測試 feature maps 大小,以人類的角度看不出句子的好壞,可能需要加 BLEU 標準
* 但 feature maps 縮的越小,對顏色的判別似乎比較差,例如紅色衣服會判斷成藍色衣服、白色公車判別為紅色公車,但這可能需要再多幾張圖觀察看看
* dropout 的地方
## 05/12
* oov_token and num_words in keras.Tokenizer
* word_index 不論如何都會出現
* `出現頻率 < num_words` 才會被換成 `<unk>`
* [num_words doesn't seem to work](https://stackoverflow.com/questions/46202519/keras-tokenizer-num-words-doesnt-seem-to-work)
* `<unk>` 的 Embedding Matrix 是取亂數
* BLEU 量化模型好壞
* OSCAR
* Visual
* Ours
* LSTM 裡面的 recurrent_dropout=0.2 試試看
* BLEU-4
* OSCAR: 41.7
* Visual attention: 25
* our seq2seq(4096): 6
* METEOR
* OSCAR: 30.6
* Visual attention: 23.9
* our seq2seq(4096): 10.4
* 分工:
* hsuanchia
* ~~BLEU 使用pretrain好的val2017 feature map~~ (Finished)
* Soft attention and visual attention
* ~~畫出 visual attention 的 架構圖~~ (老師幫我畫好了QQ)
* Code 實作
* 補上 unknown token 的 [code](https://colab.research.google.com/drive/1mbAl8OWgM8nNzfe5EzBDL1nKBDsciQoe#scrollTo=VFfCAWPQof01) debug
* ~~oov word vector使用random(可參考廷郡的Code)~~ (finished)
* ~~filter補上oov_token~~ (finished)
* ~~用text_to_sequence來做caption的label~~ 有問題,請看05/19
* 家正
* Train seq2seq base model
* encoder data: (7,7,512)
* VOC 要補上 unknown token
* 已做完,參考 [code](https://colab.research.google.com/drive/1mbAl8OWgM8nNzfe5EzBDL1nKBDsciQoe#scrollTo=VFfCAWPQof01) 可能還會有問題,有問題再回報給hsuanchia
* TimeDistributed Dropout after LSTM 對於 predict 影響
* 在fit階段使用validation, 用val2017前2500張圖片
* 在BLEU算法debug好之後, 要用來算BLEU才知道我們有多爛, 用val2017後2500張圖片來算
* train data: output_7x7x512_train5000
* 100 ephoch
* TimeDistributed(Dropout(0.5)), val predict 前 2669 個
```
{'testlen': 30567, 'reflen': 28750, 'guess': [30567, 27898, 25229, 22560], 'correct': [16038, 6235, 2172, 828]}
ratio: 1.063199999999963
Bleu_1: 0.525
Bleu_2: 0.342
Bleu_3: 0.216
Bleu_4: 0.139
METEOR: 0.169
ROUGE_L: 0.409
CIDEr: 0.352
```
* no TimeDistributed dropout, val predict 2500 個
```
{'testlen': 26580, 'reflen': 25466, 'guess': [26580, 24080, 21580, 19080], 'correct': [13558, 4649, 1529, 526]}
ratio: 1.043744600643955
Bleu_1: 0.510
Bleu_2: 0.314
Bleu_3: 0.191
Bleu_4: 0.118
METEOR: 0.160
ROUGE_L: 0.389
CIDEr: 0.350
```
* 柏瑋
* Train seq2seq base model
* encoder data: (14,14,512)
* VOC 要補上 unknown token
* 已做完,參考 [code](https://colab.research.google.com/drive/1mbAl8OWgM8nNzfe5EzBDL1nKBDsciQoe#scrollTo=VFfCAWPQof01) 可能還會有問題,有問題再回報給hsuanchia
* TimeDistributed Dropout after LSTM 對於 predict 影響
* 在fit階段使用validation, 用val2017前2500張圖片
* 在BLEU算法debug好之後, 要用來算BLEU才知道我們有多爛, 用val2017後2500張圖片來算
* 前2500,epochs=50,TimeDistributed Dropout=0.5,loss:0.5759
> Bleu_1: 0.508
Bleu_2: 0.327
Bleu_3: 0.204
Bleu_4: 0.130
METEOR: 0.162
ROUGE_L: 0.397
CIDEr: 0.319
* 前2500,epochs=100,TimeDistributed Dropout=0.2,loss:0.4575
>Bleu_1: 0.543
Bleu_2: 0.367
Bleu_3: 0.239
Bleu_4: 0.155
METEOR: 0.169
ROUGE_L: 0.422
CIDEr: 0.356
* 廷郡
* 把資料處理的code從我現在的code獨立出來
* 檔案大小 9.4 GB,讀檔進來需要花費 4.5 分鐘,可能需要斟酌
* 我自己跑過,認為在現階段 RAM 仍然夠用的情況下應該不需要額外處理,因讀取時間過慢
* TimeDistributed Dropout after LSTM 對於 predict 影響(從你現在的那個版本的model)
* TimeDistrubuted dropout - 0.3 & recurrent dropout - 0.2
```
Bleu_1: 0.564
Bleu_2: 0.369
Bleu_3: 0.234
Bleu_4: 0.149
METEOR: 0.170
ROUGE_L: 0.418
CIDEr: 0.435
```
* TimeDistrubuted dropout - 0.0 & recurrent dropout - 0.2
```
Bleu_1: 0.548
Bleu_2: 0.348
Bleu_3: 0.217
Bleu_4: 0.137
METEOR: 0.162
ROUGE_L: 0.402
CIDEr: 0.399
```
* recurrent_dropout 要不要加,要加的話比例多少(0.2?)
* 從 BLEU 來看, recurrent_dropout 必須加
* TimeDistrubuted dropout - 0.3 & recurrent dropout - 0.2
```
Bleu_1: 0.564
Bleu_2: 0.369
Bleu_3: 0.234
Bleu_4: 0.149
METEOR: 0.170
ROUGE_L: 0.418
CIDEr: 0.435
```
* TimeDistrubuted dropout - 0.3 & recurrent dropout - 0.0
```
Bleu_1: 0.551
Bleu_2: 0.359
Bleu_3: 0.224
Bleu_4: 0.140
METEOR: 0.171
ROUGE_L: 0.418
CIDEr: 0.420
```
* 還有一個 recurrent dropout=0.1 還沒算 BLEU
:::warning
訓練好的model要記得在 [github](https://github.com/hsuanchia/Image-caption/blob/main/Models/models.md) 做紀錄,需要記什麼東西請點進去看
:::
## 05/19
* keras tokenizer
* Tokenzier 的參數
* num_words: tokenizer 會用出現頻率前n的詞彙
* oov_tokens: 設定oov的詞彙
* filter: 設定你tokenizer要濾掉的符號或字
* [設定num_words沒有用?](https://stackoverflow.com/questions/46202519/keras-tokenizer-num-words-doesnt-seem-to-work) 用tokenizer.word_index之後發現它顯示的還是全部的word
* 原因是因為word_index跟index_word一定會紀錄全部的word
* num_words只有在你fit sentence上去的時候才會有用,意思是只有tokenizer知道他要用那些詞
* Example code:
```
s = [['apple'],['apple'],['apple'],['apple'],['apple'],['banana'],['banana'],['banana'],['cat']]
token = Tokenizer(num_words=4,filters='!"#$%&()*+,-./:;=?@[\\]^_`{|}~\t\n',oov_token='<unk>')
token.fit_on_texts(s)
print(token.word_index)
print(len(token.word_index))
print(token.texts_to_sequences(['apple']))
print(token.texts_to_sequences(['banana']))
print(token.texts_to_sequences(['cat']))
```
output:
```
{'<unk>': 1, 'apple': 2, 'banana': 3, 'cat': 4}
4
[[2]] #測試給tokenizer 'apple' 會轉成什麼 -> 出現頻率大於4,回傳index
[[3]] #測試給tokenizer 'banana' 會轉成什麼 -> 同上
[[1]] #測試給tokenizer 'cat' 會轉成什麼 -> 出現頻率小於4,回傳oov_token
```
* 意思是如果我們想要用tokenizer做oov的話會有幾個困難:
1. User不知道實際上出現頻率前n的詞有哪些,除非自己算
2. 在不知道實際上用了哪些詞的情況下,使用 pretrain word vector 的時候,不知道要用哪些詞mapping pretrained word vector來build embedding layer,所以只能乾脆全部的word都用,但會造成dimension提升,增加訓練時間。
* 現在我們的oov解決方法:
* 自己算要用出現頻率大於幾次的詞,自己建vocabulary。Training 時使用的vocabulary,[generate_caption](https://github.com/hsuanchia/Image-caption/blob/main/generate_caption.ipynb)時也要用一樣的。所以我有統一存在雲端,記錄在[github 的 model.md](https://github.com/hsuanchia/Image-caption/blob/main/Models/models.md)
* oov_token還是可以靠tokenizer,但是oov_token的index,default為1,建議自己建的vocabulary跟tokenizer的oov_token的index一樣
* Training前要記得把每個caption中不在自己vocabulary中的詞替換成oov_token,這樣training時才會有oov_token
* Mapping 不到 Pretrained 好的 word vector
* oov_token取全部使用的word vector的平均
* 如果map不到的:
* 全部用0向量
* 全部用隨機向量
* 感覺上用0向量的效果好一點,參考在[github 的 model.md](https://github.com/hsuanchia/Image-caption/blob/main/Models/models.md)上面的紀錄,但可能要算BLEU才知道具體差異
* 現在predict的時候句尾都會多predict出一個oov_token
* timeDistributed(Dropout 0.3), recurrent_dropout = 0.2,
* **Train model應該要一樣的東西(控制變因)**
* Training data : MSCOCO Train2017
* feature map: 14 x 14 x 512
* word 只用出現頻率 >= 5的詞,詳情可看[code](https://colab.research.google.com/drive/1mbAl8OWgM8nNzfe5EzBDL1nKBDsciQoe#scrollTo=VFfCAWPQof01)
* oov處理
* unk用所有使用的word vector取平均
* 其他mapping不到pretrain word vector的用隨機向量
* Training detail
* epochs = 100
* EarlyStopping(patience=5,monitor='loss')
* training 要用validation的話,用MSCOCO val2017 後2500張圖片
* BLEU
* 用MSCOCO val2017 前2500張圖片算,看廷郡的實驗比較之後再決定要用1000張還是2500張
* 分工
* hsuanchia
* Soft attention architecture(實際上怎麼做的,要包含各種shape)
* Soft attention 的 code
* How to recurrent by ourself using keras
* Keras 有提供的attention Layer (用法有待確認)
* Attention (Luong)
* AdditiveAttention (Bahdanau)
* MultiHeadAttention
* 家正
* 統整所有model的BLEU值
* 固定的值(控制變因):
* feature 14 x 14 x 512
* Voc: 使用全部Train2017統計, 只取出現頻率超過5的詞彙
* unk 的 word vector 使用所有有用到的 word vector 取平均
* 要變的(實驗變因)
* 有沒有做 unk token
* dropout 0 / 0.2 /0.5
* word vector mapping 不到的給隨機向量 / 0向量
* training epoch 50 / 100
* 把分數自動輸出到一個 csv 檔裡做整理,[code 寫在這裡](https://github.com/hsuanchia/Image-caption/commit/89ec87189b38b3bb8c7924673f54b0febf65494a#diff-dcef56fa3291f1858cb4cc148982171263c577108cb88dd7ba80bd1c7a0c4388),[csv 檔連結](https://github.com/hsuanchia/Image-caption/commit/89ec87189b38b3bb8c7924673f54b0febf65494a#diff-dcef56fa3291f1858cb4cc148982171263c577108cb88dd7ba80bd1c7a0c4388),[Google Sheet 方便手動修改](https://docs.google.com/spreadsheets/d/1MoNzQI0VE7oBT29gxyoqcdqsPgBeViDKvbk6hP6Z1mw/edit?usp=sharing)
~~* TODO training~~
* 柏瑋
* 加入validation in training,使用MSCOCO val2017後2500張圖片
* Train出來的話把model跟code分享至github吧!
* 廷郡
* LSTM > GRU
* 多層的LSTM
* 一層 LSTM
* TimeDistrubuted dropout - 0.3
* recurrent dropout - 0.2
```
Epoch 22/100
782/782 [==============================] - 132s 168ms/step -
loss: 0.9101 - sparse_categorical_accuracy: 0.7949
val_loss: 1.2389 - val_sparse_categorical_accuracy: 0.7716
Bleu_1: 0.571
Bleu_2: 0.376
Bleu_3: 0.238
Bleu_4: 0.152
METEOR: 0.168
ROUGE_L: 0.421
CIDEr: 0.430
```
* 一層 LSTM
* TimeDistrubuted dropout - 0.5
* recurrent dropout - 0.2
```
Epoch 26/100
782/782 [==============================] - 135s 173ms/step
loss: 0.9882 - sparse_categorical_accuracy: 0.7847
val_loss: 1.2259 - val_sparse_categorical_accuracy: 0.7734
Bleu_1: 0.584
Bleu_2: 0.389
Bleu_3: 0.250
Bleu_4: 0.162
METEOR: 0.172
ROUGE_L: 0.428
CIDEr: 0.465
```
* 兩層 LSTM
* TimeDistrubuted dropout - 0.3
* recurrent dropout - 0.2
* Dropout in LSTM 2 - 0.2
```
Epoch 19/100
782/782 [==============================] - 191s 244ms/step -
loss: 0.9554 - sparse_categorical_accuracy: 0.7878
val_loss: 1.2465 - val_sparse_categorical_accuracy: 0.7700
Bleu_1: 0.569
Bleu_2: 0.373
Bleu_3: 0.237
Bleu_4: 0.153
METEOR: 0.170
ROUGE_L: 0.424
CIDEr: 0.443
```
* 多層的GRU
* dropout rate: 0 ~ 0.5
* 算BLEU用val前1000跟前2500有沒有差別
* 差距似乎不大,誤差在 0.5% 以內
* 家正說他把load_model從generated_caption拔出去後,可節省大量時間,現在算BLEU用2500筆資料只要20分鐘,詳情看家正在github上的commit
* predict句尾有unk的測試
## 05/26
[Jiazheng's Presentaion](https://docs.google.com/presentation/d/1f9ltkirO2M5aORLefWFN2SyGAHg4i0e6smVJiqzx2jo/edit?usp=sharing)
[歡迎把成績登錄上 Google Sheet 做整合](https://docs.google.com/spreadsheets/d/1MoNzQI0VE7oBT29gxyoqcdqsPgBeViDKvbk6hP6Z1mw/edit?usp=sharing),想加啥欄位自己加
* **Train model應該要一樣的東西(控制變因)**
* Training data : MSCOCO Train2017
* feature map: 14 x 14 x 512
* word 只用出現頻率 >= 5的詞,詳情可看[code](https://colab.research.google.com/drive/1mbAl8OWgM8nNzfe5EzBDL1nKBDsciQoe#scrollTo=VFfCAWPQof01)
* oov處理
* unk用所有使用的word vector取平均
* 其他mapping不到pretrain word vector的用隨機向量
* Training detail
* epochs = 100
* EarlyStopping(patience=5,monitor='loss')
* training 要用validation的話,用MSCOCO val2017 後2500張圖片
* BLEU
* 用MSCOCO val2017 前2500張圖片算,看廷郡的實驗比較之後再決定要用1000張還是2500張
* seq2seq 發表會應討論的重點
* 我們調了哪些參數,從 baseline -> 加強版 -> 究級最強
* dropout rate (0~0.5) 比較 -> Dense layer
* recurrent_dropout 0~0.3
* feature map: 14 x 14 Vs 7 x 7
* 加入validation後觀測其 epochs 數量
* Glove map不到的詞用 0 向量 或隨機向量
* LSTM VS GRU
* 多層LSTM 跟 多層GRU
* 幾個讓人有感覺的圖片+句子例子
* 我們的 BLEU 分數是啥意思
* 分工
* hsuanchia
* ~~Soft attention architecture(實際上怎麼做的,要包含各種shape)~~
* Soft attention 的 code
* How to recurrent by ourself using keras
* Keras 有提供的attention Layer (用法有待確認)
* Attention (Luong)
* AdditiveAttention (Bahdanau)
* MultiHeadAttention
* 家正
* 幾個讓人有感覺的圖片+句子例子 -> 從前2500 val挑幾張
* baseline model 的BLEU值
* tensorflow 官方的image caption 的BLEU值
* 柏瑋
* 我們調了哪些參數,從 baseline -> 加強版 -> 究極最強
* dropout rate (0~0.5) 比較 -> Dense layer
* recurrent_dropout 0~0.3
* feature map: 14 x 14 Vs 7 x 7
* 加入validation後觀測其 epochs 數量
* Glove map不到的詞用 0 向量 或隨機向量
* 廷郡
* predict句尾有unk的測試
* 多層的GRU
* LSTM & GRU 的BLEU值的比較與統整
## 06/02
* [偶然發現keras精簡版本的seq2seq image caption code(inception V3 + GRU)](https://github.com/HyunJu1/Image-Captioning/blob/master/Image%20Captioning.ipynb)
* [Hsuanchia's attention base model](https://colab.research.google.com/drive/1WAFZ9DI-C-pInJxOZUy07pKYmyZY9aCQ#scrollTo=VbmwDu4gt4rX)
* To-Do
* Validation
* 現在 epoch = 10, 可能要更多
* 確認我的model架構沒有架錯
* Attention on text(不確定有沒有效)
* 算BLEU
* Commen sense of every model
* LSTM's Unit 512
* 14 x 14 x 512 feature map (by VGG16)
* unknown token 用隨機向量
* mapping 不到的 跟special token用隨機向量
* early stop : patient = 5, validation loss
* validation : MSCOCO val2017 後2500張圖片
* BLEU : MSCOCO val2017 前2500張圖片
* 統計與評估的Model
* Base seq2seq
* seq2seq + Dropout
* Dense Dropout : 0.5
* Recurrent dropout: 0.35
* seq2seq + Dropout + 2 * LSTM
* Dense Dropout : 0.5
* Layer 2 LSTM's input dropout : 0.3
* Layer 1 LSTM's recurrent dropout: 0.35
* Layer 2 LSTM's recurrent dropout: 0.4
* Attention based
* 分工
* Hsuanchia
* Attention 架構圖
* Attention
* 家正
* 畫seq2seq based model架構圖
* 用 Diagrams.net 畫的,[原始檔](https://drive.google.com/file/d/19aDBeY5wHcv5LLZWEvvbednMbgeQNfue/view?usp=sharing)

* 選出可比較Model差異的幾張圖片
* [Google Slide](https://docs.google.com/presentation/d/1w20eA2RlAQSSzIazRwXxRtN2srVpOy9UNH_qr8FN17s/edit?usp=sharing),[Source Code](https://colab.research.google.com/drive/1g5eSdKTzLs2xA7_Gcn-6yzx5ooaLuUtU?usp=sharing)
* 柏瑋
* 畫seq2seq based model架構圖
* 算Models的BLEU值
* 廷郡
* 畫seq2seq based model架構圖
* 
* Training Models
* model 1
```
Bleu_1: 0.564
Bleu_2: 0.369
Bleu_3: 0.233
Bleu_4: 0.147
METEOR: 0.168
ROUGE_L: 0.418
CIDEr: 0.422
```
* model 2
```
Bleu_1: 0.585
Bleu_2: 0.389
Bleu_3: 0.250
Bleu_4: 0.161
METEOR: 0.172
ROUGE_L: 0.428
CIDEr: 0.457
```
* model 3
```
Bleu_1: 0.591
Bleu_2: 0.396
Bleu_3: 0.257
Bleu_4: 0.167
METEOR: 0.176
ROUGE_L: 0.434
CIDEr: 0.475
```
## 06/05
* 比較各 model training loss,看看 dropout 對 overfitting 有無幫助
* train loss / accuracy vs. validation loss / accruacy
* 畫長條圖
* number of parameters
* Model 取名
* 報告投影片初稿
* 解釋我們目標 [name=Jiazheng]
* Visual Attention paper Fig. 1
* 用了什麼data來達到這件事
* dataset & preprocess [name=Hsuanchia]
* model 架構圖
* 效能比較
* BLEU 解釋 [name=Jiazheng]
* loss
* parameter
* Image samples [name=Jiazheng]
* 349860 滑板
* 152214 人拿熱狗笑
* 329319 黑白貓
* Commen sense of every model
* LSTM's Unit 512
* 14 x 14 x 512 feature map (by VGG16)
* mapping 不到的 跟 special token 都用隨機向量
* early stop : patient = 5, validation loss
* validation : MSCOCO val2017 後2000張圖片
* BLEU : MSCOCO val2017 前2500張圖片
* 統計與評估的Model
* Base seq2seq
* seq2seq + Dropout
* Dense Dropout : 0.5
* Recurrent dropout: 0.35
* seq2seq + Dropout + 2 * LSTM
* Dense Dropout : 0.5
* Layer 2 LSTM's input dropout : 0.3
* Layer 1 LSTM's recurrent dropout: 0.35
* Layer 2 LSTM's recurrent dropout: 0.4
* Attention based
* 分工
* Hsuanchia
* 解決Attention overfitting問題
* Attention 的相關數據
* Data preprocess了那些(報告上要用的PPT)
* Future work(報告上要用的PPT)
* 家正
* BLEU解釋與統計
* Image samples
* 總目標是什麼的解釋
* 廷郡
* Seq2seq Model架構圖修正
* loss 的統計與效能比較
* parameter的統計
## 06/09
[Merge 起來的草稿](https://docs.google.com/presentation/d/1dl2s9eeJqADNhH4a4IaeVzwwfPHM2lwrHGY_jO9c5w4/edit?usp=sharing)
> 今天我生日喔喔 [name=Jiazheng]
> 生日快樂啦XD [name=Hsuanchia]
圖片和字能大盡量大
第一張和第二張示範圖可以不一樣,避免被答案嫌疑
Motivation
reference 別換行
Performance metric
Seq2seq Model
model架構圖風格明顯不同
後面performance比較的圖表
Example 縮排 文字preprocess成同一風格 大小寫 標點符號
若 att 出來沒比較好,多個探討
model
17 epochs, train loss: 1.0473, val loss: 1.9238
model 8 -> dropout = 0.25
100 epochs stop at 16 -> train loss: 0.9933, val loss 1.6741
model 9 -> dropout = 0.2
100 epochs stop at 21 -> train loss: 1.0415, val loss 1.5515
model 10 -> No dropout
100 epochs stop at 11 -> train loss: 1.0055, val loss 1.4646
## 06/12
* Transfer Learning with GloVe Word Vectors
* Attention 比我們 Seq2seq 還差的可能原因
* loss 不穩定?並沒有
* Tensorflow tutorial 也只有 0.4,Attention 天生缺陷?
* 查出他們參數量,比較看看
* Overfitting?
* Dropout
* 參數量 只有 seq2seq 的 1/8 Underfitting
* Seq2seq 有超多 MLP 參數,才降維成 512 的 representation
* Attention 好像是取平均
* Attention
* 句尾unk的問題
* 我在寫NLP作業的時候發現是我的資料處理方式出了問題,導致在每一個句子的後面都會多一個unk,將這個bug修復後,已正常。
* 改成flatten之後的參數量: 111,353,418
* Model 11
* h0, c0 用不同的MLP
* stop at 13 epochs loss: 0.9696, val_loss: 1.4313
* 450s per epochs
* BLEU-1: 0.409
* BLEU-2: 0.250
* BLEU-3: 0.143
* BLEU-4: 0.082
* METEOR: 0.130
* ROUGR_L: 0.359
* CIDEr: 0.182
* 改成flatten之後的參數量: 59,972,682
* Model 12
* h0, c0 用相同的MLP
* stop at 14 epochs loss: 1.0098, val_loss: 1.3897
* 334s per epochs
* BLEU-1: 0.419
* BLEU-2: 0.262
* BLEU-3: 0.154
* BLEU-4: 0.091
* METEOR: 0.134
* ROUGR_L: 0.364
* CIDEr: 0.201
* 上面這個加上dropout(0.5)在flatten之後
* Model 13
* stop at 12 epochs loss: 0.9766, val_loss: 1.3585
* BLEU-1: **0.520**
* BLEU-2: 0.327
* BLEU-3: 0.196
* BLEU-4: 0.116
* METEOR: 0.144
* ROUGR_L: 0.393
* CIDEr: 0.314
* 上面這個加上dropout(0.35)在flatten之後
* Model 14
* stop at 11 epochs loss: 0.9676, val_loss: 1.3771
* BLEU-1: 0.466
* BLEU-2: 0.293
* BLEU-3: 0.173
* BLEU-4: 0.102
* METEOR: 0.140
* ROUGR_L: 0.384
* CIDEr: 0.290
* Dropout(0.5) + 2 * LSTM
* val = 1500
* Model 15
* stop at 12 epochs loss: 0.8893, val_loss: 1.3073
* Bleu_1: **0.582**
* Bleu_2: 0.379
* Bleu_3: 0.236
* Bleu_4: 0.146
* METEOR: 0.163
* ROUGE_L: 0.419
* CIDEr: 0.394
* Dropout(0.5) + 2 * LSTM + recurrent_dropout(0.5)
* val = 1500
* Model 16
* stop at 13 epochs loss: 0.9443 , val_loss: 1.2696
* 512s per epochs
* Bleu_1: **0.585**
* Bleu_2: 0.387
* Bleu_3: 0.244
* Bleu_4: 0.153
* METEOR: 0.167
* ROUGE_L: 0.426
* CIDEr: 0.419
* Dropout(0.5) + 2 * LSTM + recurrent_dropout(0.35)
* val = 1500
* Model 17
* stop at 12 epochs loss: 0.8960 , val_loss: 1.2970
* Bleu_1: 0.568
* Bleu_2: 0.370
* Bleu_3: 0.233
* Bleu_4: 0.146
* METEOR: 0.163
* ROUGE_L: 0.416
* CIDEr: 0.386
## 06/16 專題成果發表
* Attention
* Add dropout at alignment model
* 可能是資料量不足(用model 3 跟attention比)
* Model 3 是seq2seq的極限 他的overfitting的情況是最好的了
* TF tutorial 跟 visual attention 都沒有比我們最好的attention還要好,(因為c0,h0)
* 期末專題發表簡報 [Google Slide](https://docs.google.com/presentation/d/1dl2s9eeJqADNhH4a4IaeVzwwfPHM2lwrHGY_jO9c5w4/edit?usp=sharing),[PDF 檔](https://drive.google.com/file/d/1eI3t05SpUrTtDlQ94pobCdD8UuEpx8Hh/view?usp=sharing)
* [錄影檔](https://drive.google.com/file/d/1osgSvN9IIbs31APkb0oRUdwxVAYvfe1_/view?usp=sharing) (用我們四個人的學校帳號登入才看得到)
## 06/19 Deep Learing 報告
* [Project report Google Docs](https://docs.google.com/document/d/1k6rTLNaDQIu1Il3XhslxpYGg3vfl5xMJaGUUSD84vZU/edit?usp=sharing)
* [深度學習課程的的協作共筆](https://hackmd.io/jxxYCiGzQeeE4Jpd9cYH_g)
## 10/11
* [本專題中英文摘要](https://hackmd.io/@hsuanchia/project-abstract)