# AI Challenge
[AI Challenge - SQuAD 1.1 with BERT-Base Guidelines](https://hpcadvisorycouncil.atlassian.net/wiki/spaces/HPCWORKS/pages/1326612594/AI+Challenge+-+SQuAD+1.1+with+BERT-Base+Guidelines)
## 安裝
```
docker login nvcr.io # 到ngc上獲取API key
systemctl restart docker #重開
docker pull nvcr.io/nvidia/tensorflow:20.02-tf1-py3 # 很久
```
比賽官方提供的`docker run -it --net=host -v bigdata:/bigdata 0c7b70421b78`會使container無法吃到GPU,因此撰寫一個`launch.sh`
```shell=
#!/bin/bash
CMD=${@:-/bin/bash}
NV_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES:-"all"}
nvidia-docker run -it \
--name bert \
--net=host \
--shm-size=1g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-e NVIDIA_VISIBLE_DEVICES=$NV_VISIBLE_DEVICES \
0c7b70421b78 $CMD
```
## 進入container
```
bash launch.sh # 之後進去要先docker restart [container_id]
```
下載BERT-BASE model file
```
cd /workspace/nvidia-examples/bert/data
mkdir download
cd download
mkdir google_pretrained_weights
cd google_pretrained_weights
wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
unzip uncased_L-12_H-768_A-12.zip
```
下載SQuAD 1.1 dataset
```
cd /workspace/nvidia-examples/bert/data/download
mkdir squad
cd squad
mkdir v1.1
cd v1.1
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
wget https://github.com/allenai/bi-att-flow/archive/master.zip
unzip master.zip
cd bi-att-flow-master/
cd squad
cp evaluate-v1.1.py /workspace/nvidia-examples/bert/data/download/squad/v1.1/
```
## Fine-turning
更改參數
```
cd /workspace/nvidia-examples/bert
vim scripts/run_squad.sh
```
可行參數:
```shell=
batch_size=${1:-"8"}
learning_rate=${2:-"5e-6"}
precision=${3:-"fp16"}
use_xla=${4:-"true"}
num_gpu=${5:-"2"}
seq_length=${6:-"384"}
doc_stride=${7:-"128"}
bert_model=${8:-"base"}
squad_version=${9:-"1.1"}
init_checkpoint=${10:-"data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_model.ckpt"}
epochs=${11:-"2.0"}
RESULTS_DIR=/workspace/nvidia-examples/bert/results/${TAG}_${DATESTAMP}
```
跨機(在外面,且注意container名稱要一樣):
`mpirun -np 4 -f host -ppn 2 docker exec bert sh -c 'cd /workspace/nvidia-examples/bert && bash scripts/run_squad.sh'`
結果(4/11):`{"exact_match": 78.20245979186376, "f1": 86.34117843261045}`
## (可選)Alternative method with Lambda Labs
在`workspace`下
```
mkdir lambdal
cd lambdal
git clone https://github.com/lambdal/bert
cd /workspace/lambdal/bert
vim run_squad.sh
```
```shell=
python3 run_squad_hvd.py \
--vocab_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/vocab.txt \
--bert_config_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_config.json \
--init_checkpoint=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_model.ckpt \
--do_train=True \
--train_file=/workspace/nvidia-examples/bert/data/download/squad/v1.1/train-v1.1.json \
--do_predict=True \
--predict_file=/workspace/nvidia-examples/bert/data/download/squad/v1.1/dev-v1.1.json \
--train_batch_size=4 \
--learning_rate=3e-5 \
--num_train_epochs=2.0 \
--max_seq_length=384 \
--doc_stride=128 \
--output_dir=/workspace/nvidia-examples/bert/results/lambdal/squad1/squad_base/ \
--horovod=true
```
在外面
`mpirun -np 4 -f host -ppn 2 docker exec bert sh -c 'cd /workspace/lambdal/bert && bash run_squad.sh'`
$\to$ OOM
## 預測
```
cd /workspace
git clone https://github.com/google-research/bert.git
cd bert
```
撰寫輸入(問題)檔,注意id
```
vim test_input.json
```
```shell=
{
"version": "v1.1",
"data": [
{
"title": "your_title",
"paragraphs": [
{
"qas": [
{
"question": "Who is current CEO?",
"id": "56ddde6b9a695914005b9628",
"is_impossible": ""
},
{
"question": "Who founded google?",
"id": "56ddde6b9a695914005b9629",
"is_impossible": ""
},
{
"question": "when did IPO take place?",
"id": "56ddde6b9a695914005b962a",
"is_impossible": ""
}
],
"context": "Google was founded in 1998 by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University in California. Together they own about 14 percent of its shares and control 56 percent of the stockholder voting power through supervoting stock. They incorporated Google as a privately held company on September 4, 1998. An initial public offering (IPO) took place on August 19, 2004, and Google moved to its headquarters in Mountain View, California, nicknamed the Googleplex. In August 2015, Google announced plans to reorganize its various interests as a conglomerate called Alphabet Inc. Google is Alphabet's leading subsidiary and will continue to be the umbrella company for Alphabet's Internet interests. Sundar Pichai was appointed CEO of Google, replacing Larry Page who became the CEO of Alphabet."
}
]
}
]
}
```
撰寫執行檔
```
vim do_predict.sh
```
```shell=
python3 run_squad.py \
--vocab_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/vocab.txt \
--bert_config_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_config.json \
--init_checkpoint=/workspace/nvidia-examples/bert/results/Ding_Hao_Ran/model.ckpt-10949 \ # 建立於4/11的checkpoint
--do_train=False \
--max_query_length=30 \
--do_predict=True \
--predict_file=test_input.json \
--predict_batch_size=16 \
--max_seq_length=384 \
--doc_stride=128 \
--output_dir=/workspace/nvidia-examples/bert/results/squad1/squad_test/
```
(若為alternative method則另外寫,略)
執行:
```
bash do_predict.sh
```
結束後檢查(答案):
`/workspace/nvidia-examples/bert/results/squad1/squad_test/predictions.json`
`/workspace/nvidia-examples/bert/results/squad1/squad_test/nbest_predictions.json`
(以上兩個檔案保存在`/workspace/nvidia-examples/bert/results/squad1/Ding_Hao_Ran_output`)
## 比賽限制
* 限制使用BERT-Base, Uncased模型
* HuggingFace implementation 為官方標準。允許使用其標準或對其修改(?)
* 可使用不同的優化器、學習率衰減等技巧,但不允許更改模型架構
* 不可更改模型的超參數或添加層
* 整個模型均須進行fine-tuning,不可凍結(freeze)層
* 須提供輸出結果的方法(scripts and methodology)
## 須提供
* 所使用的指令與結果(routine and command lines and output)
* 模型評估結果,其中sequence length需設置在128(tokens)
* 模型的checkpoint與inference(?)檔
* 訓練scripts、方法、指令與執行log檔
* run_squad.py predictions.json 和 nbest_predictions.json
## 最後評分
評審將使用Squad 1.1的`evaluate-v1.1.py`進行評分
例如:{"exact_match": 81.01229895931883, "f1": 88.61239393038589}
或使用`run_squad.py`。
## 題外-Docker images push
```
docker commit bert ncku_bert:v{version}
docker tag ncku_bert:v{version} {docker_username}/ncku_bert:v{version}
docker push {docker_username}/ncku_bert:v{version}
```