# 專題實驗記錄
Recoder : 蘇鈺琁
204/02/22
---
- [簡報](https://docs.google.com/presentation/d/16r33PXiUrVem_XuhlraLCHa7l-kraC43/edit?usp=sharing&ouid=107829962859666644459&rtpof=true&sd=true)
- 使用以下三種方法改進model input
1.找出與title相關句子依序生成五個句子作為model input
2.一次生成五個
3.從evidence找出相關句子
透過相似度找出最合適的輸出claim
數量:418筆
- Discussion
- To-do
2024/01/31
---
- 生成結果
- input = title
- 相較完整、但有些姓氏缺漏(筆畫複雜)
```python=
'<unk> 瑞疫苗在美國,從美國進口到台灣的國'
```
<!-- ```python= -->
'高端疫苗、疫苗等弊案共40件,前福部長***時中***、***食署***長***秀梅***都列為被告?'
'消防員撞清德總部,NCC應定電視不乍播?!?!'
"網傳***蔡英文***視導空軍部隊時,還跟部隊通話時被解放軍台?"
"「***國瑜***將市長和總統選的補助款全部捐出,共15000萬元?」"
"國民黨主席***王金*平**近日出书,要求國民黨下台、國民黨總統參選人向全民道歉。不过,國民黨統***馬英九***卻说,五是沒用書的圖片。"
<!-- ``` -->
- 不能爬到全部文章內容
- 只能爬到page 41以下,全部48
- https://tfc-taiwan.org.tw/taxonomy/term/473?page=3
- 數量 418
```python=
Failed to establish a new connection: [WinError 10060] 連線嘗試失敗,因為連線對象有一段時 間並未正確回應,或是連線建立失敗,因為連線的主機無法回應。
```
- Discussion
- 方法
- model input:
- 1.找出與title相關句子依序生成五個句子作為model input
- 2.一次生成五個
- 生成數量調整:
- 3.claim屬於第五個句子
- 4.不使用title改用evidence找出相關句子
- 報告調整
- 做簡報程述:先前情提要連結過去到目前的進度、實作動機
- 模型測試
- 使用validation 和test data測試效能
- To-do
- [x] 使用討論方法
- [x] 報告呈現調整
2024/01/22
---
- 生成結果
```python=
'<unk> 瑞疫苗在美國,從美國進口到台灣的國'
```
- 驗證loss沒有下降
- 
- 參數
- batch size = 8
- lr = 1e-4
- 模型 : Langboat/mengzi-t5-base
- template
```python=
text='{"soft":"我想要產生"} {"placeholder":"text_a", "shortenable":"True"} {"soft":"的相關資訊,我應該要搜尋"} {"special": "<eos>"} {"mask"}',
text='{"soft":"我想要產生有關"} {"placeholder":"text_a", "shortenable":"True"} {"soft":"的呈述,此呈述為"} {"special": "<eos>"} {"mask"}'
```
- 使用其他模型和tokenizer
```python=
from openprompt.plms import load_plm
model_name = Langboat/mengzi-t5-base
model, tokenizer = load_plm(model_name)
```
```python=
1. 換模型
model_name = uer/t5-v1_1-base-chinese-cluecorpussmall
model_class = get_model_class(plm_type = model_name)
File "/usr/local/lib/python3.8/site-packages/openprompt/plms/__init__.py", line 76, in get_model_class
return _MODEL_CLASSES[plm_type]
KeyError: 'uer/t5-v1_1-base-chinese-cluecorpussmall'
```
```python=
2. 使用其他套件
from transformers import BertTokenizer, MT5ForConditionalGeneration
tokenizer = BertTokenizer.from_pretrained("uer/t5-v1_1-base-chinese-cluecorpussmall")
model = MT5ForConditionalGeneration.from_pretrained("uer/t5-v1_1-base-chinese-cluecorpussmall")
```
- server 記憶體
```python=
torch.cuda.OutOfMemoryError: CUDA out of memory.
Tried to allocate 20.00 MiB (GPU 0; 10.90 GiB total capacity;
2.93 GiB already allocated;
7.88 MiB free; 3.16 GiB reserved in total by PyTorch)
If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
```
- To-do
- [ ] 修改模型架構 : T5 PEGASUS、T0
2024/01/16
---
- claim 人工標記
- server執行以下資料處理時,出現解析錯誤
```python=
train = 'data_train.json'
validaion = 'data_dev.json'
test = 'data_test.json'
datasets = DatasetDict.from_json({'train':train, 'validation':validaion, 'test':test})
```
```python=
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/datasets/packaged_modules/json/json.py", line 121, in _generate_tables
pa_table = paj.read_json(
File "pyarrow/_json.pyx", line 290, in pyarrow._json.read_json
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Missing a closing quotation mark in string. in row 5
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/datasets/builder.py", line 1925, in _prepare_split_single
for _, table in generator:
File "/usr/local/lib/python3.8/site-packages/datasets/packaged_modules/json/json.py", line 144, in _generate_tables
dataset = json.load(f)
File "/usr/local/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/local/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 82640895: unexpected end of data
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 8, in <module>
datasets = DatasetDict.from_json({'train':train, 'validation':validaion, 'test':test})
File "/usr/local/lib/python3.8/site-packages/datasets/dataset_dict.py", line 1450, in from_json
return JsonDatasetReader(
File "/usr/local/lib/python3.8/site-packages/datasets/io/json.py", line 59, in read
self.builder.download_and_prepare(
File "/usr/local/lib/python3.8/site-packages/datasets/builder.py", line 954, in download_and_prepare
self._download_and_prepare(
File "/usr/local/lib/python3.8/site-packages/datasets/builder.py", line 1049, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/usr/local/lib/python3.8/site-packages/datasets/builder.py", line 1813, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/usr/local/lib/python3.8/site-packages/datasets/builder.py", line 1958, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
```
<!-- - 生成
```python=
_, output_sentence = self.model.generate(batch, **generation_arguments)
```
- AttributeError: 'PromptForGeneration' object has no attribute 'can_generate'
- 更換版本transformer 4.19.0 -->
2024/01/09
---
- paper_related work
- 篇幅
- 文章檢索
- claim 人工標記
- preprocess
- evidence : gold evidence(1~4) & evidence(4)
- tgt_text: 放claim
- dataset:
- "evidence": [["Margot_Kidder", "15", "In 2005 , Kidder became a naturalized U.S. citizen ."]]
```python =
temp_label = set()
for evidences in data['evidence']:
for evidence in evidences:
temp_label.add(evidence[2])
input_example = InputExample(text_a = data['claim'],
tgt_text=" [SEP] ".join(temp_label))
```
- meta: 存多筆evidence
- To-do
- [ ] paper 修改
- [ ] document retrieval
- [x] 訓練模型: tgt_text=claim
- [ ] title
- [x] meta: evidence
(https://github.com/thunlp/OpenPrompt/blob/main/tutorial/6.1_chinese_dataset_uer_t5.py)
2024/01/02
---
- paper_related work
- sentenceBERT part
- claim 人工標記
- method
- To-do
- [x] 人工標記:T5、generation
- [ ] paper modification: 問題->方法、模型
12/26
---
- paper_related work
- sentenceBERT
- Prompt-based learning
- PEFT
- claim verification
- documnt retrieval
- evidence retrieval
- Problem
- 段落(分段)
- 內容量
- 中文-英文
- To-do
- [ ] 翻譯
- [x] cite
- [ ] p-tuning & PEFT 做了什麼貢獻,跟我們有甚麼相關
12/12
---
- datasets
- "title": "【部分錯誤】網傳圖卡「桃園北景雲計畫獲得建築金石獎『優良建築施工品質類』獎項」?",
"source": "",
"publish_date": "2022-10-05",
"domain": "部分錯誤",
"claimWeb": "台湾事实查核中心",
"category": "事實查核報告",
"url": "https://tfc-taiwan.org.tw/articles/8249",
"editor": "",
"gold evidence": "['社群平台、通訊軟體自2022年9月27日開始流傳一張圖卡,內容為27屆「中華建築金石獎」得獎名單.....
- claim: 人工標記 REACT
- evidence?
- 找出查核文章的摘要
- paper
- [link](https://www.overleaf.com/project/64f1a961d9218f7d0cbfef6b)
- retrieval
- [link](https://docs.google.com/presentation/d/1VYD2poCowLBqV8xBw0Joe332gVlLx-qjTiMgnRkmFAw/edit#slide=id.g23f22e2422a_0_141)
- To-do
- [ ] 前後端架構圖
- [ ] paper: related work
- [ ] 人工標記query
11/28
---
- 爬資料
- 無法爬到部分資料
- 必須爬完整篇內容
- 做分類器
- 模型表現過好
- 標籤詞給錯
- 重新訓練
- 設備使用問題
11/20
---
- 爬問題版中網友提問問題
- 根據問題爬文
- 數量不規律
- 根據內容爬文
- 網頁不像問題可以透過規律爬每篇內容
- To-do
- [ 爬事實查核中心近兩年的資料 ](https://tfc-taiwan.org.tw/taxonomy/term/473)
- 做分類器
11/14
---
- [related work](https://www.overleaf.com/project/64f1a961d9218f7d0cbfef6b)
- claim verification
- 中文 query generation(datasets)
- dataset 從問答系統標記
- multi-keyword retrival
- To-do
- 爬問題版中網友提問問題
相關技術
方法
10/31
---
- Google translate API
- demo 影片
- 這周
- 報告練習
- 論文
- document retrieval
- sentence retrieval
- verification
- query generation(frontend)Web
10/24
---
- Google translate API
- IDE可以執行翻譯,但套用到瀏覽器無法識別
- 瀏覽器不支援新版本套件require()
- ES6 modules in your browser
- 修改語法:使用import
- 在HTML新增<script type="module" >
import {require} from 'background.js';
- web browsers cannot resolve bare imports by themselves.
- import translate from '../node_modules/@google-cloud/translate';
- An unknown error occurred when fetching the script.
- 翻譯後的文字改變url
- 後端再進行翻譯
- To-do
- pythonAPI給後端
- demo影片
10/17
---
- related work撰寫
- document retrieval
- sentence retrieval
- query generation
- claim verification
- 問題-參考文章的範圍:要撰寫參考多少文章
- FastAPI
- completion
- google中英文翻譯
- wiki or google API選擇
- To do
- [ ] Wiki API button
- [ ] google translate
10/2
---
- 競賽簡報討論
9/26
---
- 實作調整
- 調整template問法
- template_text = '{"placeholder":"text_a"} {"soft":"according to the above content, the following description:"} {"placeholder":"text_b"} {"soft":"To access the outcome of claims and evidence as SUPPORTS, REFUTES, or NOT ENOUGH INFO?"} {"mask"}'
- label_words={0: ["SUPPORTS"], 1: ["REFUTES"], 2: ["NOT ENOUGH INFO"]
- 針對不同資料集在PromptDataLoader的batch_size給予不同參數值
- 增加early_stop
- 實作結果
| BERT-base | Recall | Precision | Micro F1 | Macro F1 |
| ------------------- | ------ |:---------:| -------- | -------- |
| Fine-Tune | 81.18% | 80.34% | 80.27% | 80.31% |
| Hard Prompt | 85.58% | 85.62% | 85.10% | 85.18% |
| P-Tuning v1 | 81.85% | 71.62% | 81.47% | 81.36% |
| P-Tuning v2(freeze) | 79.45% | 78.77% | 74.34% | 74.38% |
| P-Tuning v1(freeze) | 75.85% | 75.03% | 78.66% | 78.65% |
9/19
---
- 實作v1(full)
- max_length=128、batch_size=4、lr=5e-6
- Epoch 0, train_loss 0.9160319384500784
Epoch 0, valid_loss 0.9868071081722716
Epoch 0, valid f1 0.5189193461841252
Epoch 1, train_loss 0.8493666888424999
Epoch 1, valid_loss 0.918308158982994
Epoch 1, valid f1 0.5797136205129115
Epoch 2, train_loss 0.8335107564677446
Epoch 2, valid_loss 0.9729073854288588
Epoch 2, valid f1 0.5598199555673559
Epoch 3, train_loss 0.8247267499684228
Epoch 3, valid_loss 1.0184914831611622
Epoch 3, valid f1 0.5507444379556854
Epoch 4, train_loss 0.8189314238966166
Epoch 4, valid_loss 1.0359398625486242
Epoch 4, valid f1 0.5457238666726648
Epoch 5, train_loss 0.8148611394570011
Epoch 5, valid_loss 0.999382639446868
Epoch 5, valid f1 0.5602789179686796
Epoch 6, train_loss 0.8102320843921436
Epoch 6, valid_loss 1.0427383227427172
Epoch 6, valid f1 0.5496713413369875
Epoch 7, train_loss 0.8093603813985177
Epoch 7, valid_loss 1.0706756948502771
Epoch 7, valid f1 0.5463009922597993
Precision (micro): 55.72%
Recall (micro): 55.72%
F1 (micro): 55.72%
Precision (macro): 58.69%
Recall (macro): 55.42%
F1 (macro): 55.79%
- 調整
- 資料集格式
- template建構:發現claim和evidence放同樣的sentence
- template_text = '{"placeholder":"text_b"} {"soft":"according to the above content, the following description:"} {"placeholder":"text_a"} {"soft":"what is the relation?"} {"mask"}'
- Epoch 0, train_loss 0.8669554500385469
Epoch 0, valid_loss 0.936854774981408
Epoch 0, valid f1 0.6036477151100477
Epoch 1, train_loss 0.7679899839898798
Epoch 1, valid_loss 0.9043158296625182
Epoch 1, valid f1 0.6488751880100962
Epoch 2, train_loss 0.7352776492373218
Epoch 2, valid_loss 0.8486452470738095
Epoch 2, valid f1 0.6849687774682977
Epoch 3, train_loss 0.7180054512100116
Epoch 3, valid_loss 0.9267781534755016
Epoch 3, valid f1 0.681125223880603
Epoch 4, train_loss 0.7070338488409819
Epoch 4, valid_loss 0.9273080223714035
Epoch 4, valid f1 0.6956933198259877
Epoch 5, train_loss 0.6981460807688539
Epoch 5, valid_loss 1.0149452810545696
Epoch 5, valid f1 0.6877563660829588
Epoch 6, train_loss 0.6946998982536632
Epoch 6, valid_loss 0.9460171354394808
Epoch 6, valid f1 0.6981018571474289
Epoch 7, train_loss 0.6888278553751555
Epoch 7, valid_loss 0.9543328048022381
Epoch 7, valid f1 0.6977340308943255
Precision (micro): 68.31%
Recall (micro): 68.31%
F1 (micro): 68.31%
Precision (macro): 70.06%
Recall (macro): 69.19%
F1 (macro): 68.17%
- 資料格式
- 
- 
- paper
- 新增訓練參數量和時間
- v2: trainable params: 373,254 | all params: 108,683,526 || trainable%: 0.3434319935479458
- v1:trainable params: 1,784,070 | all params: 110,094,342 || trainable%: 1.6204919958556998
- 
9/12
---
- 實作v1(full)
- 
- 調整:
- 新增 eps=1e-8
- batch_size = 8->4
- max_seq_length = 128->512
- paper [link](https://www.overleaf.com/project/64f730c637ecc799e0527905)
- [x] 添加到範本
- [ ] 表格
- [ ] 圖片
- [ ] 文字修飾
9/8
---
- p-tuning v1效能提升
- num_epochs = 8
lr = 3e-4
batch_size = 4
- 參數凍結
- 
- 
- 程式編寫
- 
9/5
---
- prompt-based learning彙整[link](https://www.overleaf.com/read/mzsrtwqjpbcs)
- To-do
- 寫paper
- deadline 9/19
- To-do
- output模型預測結果
- template改善
- 資料集處理
8/31
---
- 伺服器使用
- image:python版本能夠安裝openprompt等library
- container
- -v路徑設定:能執行存在本機的資料夾
- 只能在設定batchsize=1才可以訓練
- 只使用到部分容量
- import torch, gc
gc.collect()
torch.cuda.empty_cache()
- with torch.no_grad()
8/24
---
- 使用實驗室的設備[教學](https://hackmd.io/9mUz_n1jTje2pZ2QDvTacQ)
- 中央VPN連線[連結](https://ncu.edu.tw/VPN/info)
- Remote-SSH使用
- 之後開啟遠端總管,並新增遠端
- 安裝SSH client [link](https://learn.microsoft.com/zh-tw/windows-server/administration/openssh/openssh_install_firstuse)
- Ducker [教學](https://www.runoob.com/docker/docker-container-usage.html)
8/22
---
- progress [slide](https://docs.google.com/presentation/d/1L_ZBTb_MsJ6UBQtHoI-mF7U7nxzvMvus/edit?usp=drive_web&ouid=103103346235533747462&rtpof=true)
- Discussion
- p-tuning v2
- Precision (macro): 56.53%
Recall (macro): 47.19%
F1 (macro): 39.45%
- GPU
- AttributeError: module ‘torch’ has no attribute ‘frombuffer’
- arr = torch.frombuffer(v[“data”], dtype=dtype).reshape(v[“shape”])
- 法一:更新版本->導致無法使用GPU
- 法二:更改語法方法
- 安裝最新的pytorch版本:pip3 install torch torchvision torchaudio --index-urlhttps://download.pytorch.org/whl/cu117
- 目前有成功裝起GPU
- torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 2.00 GiB total capacity; 1.01 GiB already allocated; 41.91 MiB free; 1.12 GiB reserved in total by PyTorch)
- 電腦記憶體不夠支援
- 降低batch_size
- 使用實驗室的設備
- To-do
- 解決torch.cuda.OutOfMemoryError: CUDA out of memory
- 重新訓練模型獲得更好的效能數據
8/17
---
- hard prompt
- learning rate:1e-4->1e-5
- 
- 
- Accuracy: 85.10%
Precision: 85.62%
Recall: 85.58%
F1 Score: 85.14%
- p-tuning v1
- 看影片整理時間軸哪裡是需要的
- 程式碼哪裡是p-tuning v1
- 找出能改善的地方
- 解決map tokenizer
- 
- 重新處理datasets
- 參考學姊process的function將原始資料改成符合map格式的樣子 [程式碼](https://github.com/MichelleHS777/PEFT-Chinese-Fact-Verification/blob/main/preprocess.py)
- 先個別將資料預處理
- 再分別將三個資料集打包成一個datasets
- 參數
- lr_scheduler
- 學習率調整[說明](https://towardsdatascience.com/a-visual-guide-to-learning-rate-schedulers-in-pytorch-24bbb262c863)
- checkpoint
- 加載之前保存的模型參數能在測試數據上評估
- torch.load:加載模型參數的字典使模型具有之前訓練得到的參數
- state_dict:將之前訓練好的模型參數用到模型中
- Precision (macro): 68.88%
Recall (macro): 63.23%
F1 (macro): 61.09%
- 問題
- 重新使用GPU執行
- 開新帳號
- pycharm安裝GPU
- [CUDA&CUDAnn](https://www.qingtianseo.com/detail/949.html)
- 查看torch官網後發現沒有cuda11.2版本對應的torch下載。
考慮到版本向下兼容,可能不一定非要下載cuda=11.2對應的那個版本的torch。所以選擇下載cuda11.1的版本
- [pytorch安裝](https://blog.csdn.net/wangmengmeng99/article/details/128318248?spm=1001.2101.3001.6661.1&utm_medium=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-128318248-blog-124355474.235%5Ev38%5Epc_relevant_anti_t3&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-128318248-blog-124355474.235%5Ev38%5Epc_relevant_anti_t3&utm_relevant_index=1)
- GPU限制:hard prompt
- systemRAM:p-tuning v1
- Kaggle
- To-do
- p-tuning v1效能提升
- pytorch安裝
- p-tuning v2模型訓練
8/10
---
- progress
- [slide](https://docs.google.com/presentation/d/1j4BzenQif5Q6lvY91YgPwduhn24Z_giu/edit?usp=sharing&ouid=103103346235533747462&rtpof=true&sd=true)
- Discussion
- loss降不下去的可能[原因](https://juejin.cn/s/loss%E5%80%BC%E4%B8%8D%E4%B8%8B%E9%99%8D)
- gradient關係
- learning rate數值大小關係
- 參考程式碼[影片](https://drive.google.com/file/d/1aZio3HdogqEvtc-AipAyDE4L4WQmNvA8/view)
- To-do
- 效能改善
- 能訓練出v1模型
8/8
---
- progress
- [slide](https://docs.google.com/presentation/d/1j4BzenQif5Q6lvY91YgPwduhn24Z_giu/edit?usp=sharing&ouid=103103346235533747462&rtpof=true&sd=true)
- Discussion
- hard prompt效能
- Accuracy: 0.29266163725918076
Precision: 0.09755387908639358
Recall: 0.3333333333333333
F1 Score: 0.15093490249039374
- P-tuning v1
- load dataset方法:使用API加入網址和版本載入資料集
- metric
- 用來訓練模型的評估指標
- 無法使用實作的資料集載入
- 必須自定義=>後續再做
- map
- 讓資料集能完成對應的function
- function是用來做分詞的
- 修改範例的分詞句子:sentence->claim & evidence
- label:字串->數字
- 問題
- 資料型別問題:evidence想只用陣列裡的第三個字串元素
- 分詞長度不一致:多個claims對應第一個claim的eidence的第三個元素
- 範例語法:沒有padding設定和最大斷句長度
- parameters
- 執行時出現套件安裝的提示:
- 不知道安裝的原因
- 安裝後卻仍然出現
- 執行範例碼卻沒有出現
- To-do
- 效能改善
- 能訓練出v1模型
7/25
---
- progress
- [slide](https://docs.google.com/presentation/d/1Bhq41e3tVHeXusLa9jSKFMWfZiUrwBtL/edit?usp=drive_web&ouid=103103346235533747462&rtpof=true)
- Discussion
- 實作問題
- 新的實作環境
- 速度太慢
- CPU->GPU
- GPU安裝
- 記憶體不足->當機
- To-do
- GPU安裝
- 記憶體問題->訓練資料轉移到GPU
- 改回colab
7/12
---
- Discussion
- dataset有同claim不同label
- 學習方法
- 小型程式+擴充套件
- Prompt+Rola???
- 態度
- 詳細解釋讓別人懂
- 獨特貢獻
- 完整>創新
7/11
---
- progress
- [slide](https://docs.google.com/presentation/d/1NDVn4LPbsbueE0j02pa5-xhTDM8jJ1po/edit?usp=drive_web&ouid=103103346235533747462&rtpof=true)
- Discussion
- 實作說明
- Hard prompt
- P-tuning v1
- PromptEncoder
- To-do
- P-tuning v2
- Prefix prompt
- 評估效能指標設計
6/27
---
- progress
- [slide](https://docs.google.com/presentation/d/1HYLzHqYIKnZKkcWcu7aOZ-GmOEgzuSgx/edit?usp=drive_link&ouid=103103346235533747462&rtpof=true&sd=true)
- Discussion
- 介紹資料集FEVER
- 介紹hard prompt、soft prompt(P-tuning、Prefix tuning)
- [V2實作](https://blog.csdn.net/as949179700/article/details/130900814)
- To-do
- 資料集取得、模型tuning
6/6
---
- Progress
- [slide](https://docs.google.com/presentation/d/1fpb1DCZpO8iOyLdguuE5r5Y8tV6ucEQZ/edit?usp=drive_link&ouid=103103346235533747462&rtpof=true&sd=true)
- Discussion
- 介紹任務
- 說明模型調整的新方法P-tuning
- To-do
- Dataset : [FEVER](https://aclanthology.org/N18-1074/)
- Model : [P-tuning]()
5/23
---
- Disscussion
- 報告方式
- 流程細項說明
- 範例解釋專有名詞
- Claim Verification of Fact Checking
- To-do
- [X-FACT:A New Benchmark Dataset for Multilingual Fact Checking](https://aclanthology.org/2021.acl-short.86.pdf)
- Implementation
4/18
---
- process
- [slide](https://docs.google.com/presentation/d/1ZltWpbwZCcjNSzYRWCZDngIOCL0j2nGV/edit#slide=id.p1)
- Discussion
- 專題展報告與簡報呈現建議
- Datadescription(reference)
- Multi-label在資料上的比例
- 細節(怎麼實作、模型架構呈現方式)
- 視覺化的呈現比較
- 影片表達方式(注視、講優點
- 動機
- 改善過程學習到甚麼
- To-do
- 海報
- 簡報修改
- 改善模型提升效能
3/21
---
- process
- [slide](https://docs.google.com/presentation/d/1YJbOBFONagMjS84SXZVjoPIjrQ_AkNfS/edit#slide=id.p10)
- Disscussion
- 透過學姊如何使用API來實作
- 針對目前需求不需要建立sever使用API(這部分是長期使用下確保穩定性才這樣做)但目前只是短暫使用不需要這麼深入使用
- 建議使用Openai 的API透過對話生成label_word
- To-do
- 使用Openai API生成label_word
- 修改模型格式
- 建立模型
- 設定使用參數
3/14
---
- process
- [slide](https://docs.google.com/presentation/d/1PX5VZpAU56T9SIilCJyhirCpE99kx0f6/edit?usp=share_link&ouid=103103346235533747462&rtpof=true&sd=true)
- Disscussion
- 介紹KBs
- 實作KBs產生的問題
- API
- To-do
- 修改KBs產生label word
- 調整label word符合模型格式
2/21
---
- process
- [slide](https://docs.google.com/presentation/d/14dyHaW5SCITAglGERQOb4_7A9DqcT0Wd/edit?usp=share_link&ouid=103103346235533747462&rtpof=true&sd=true)
- Disscussion
- 介紹KPT(knowledgeable prompt-tuning)如何使用在prompt based learning
- label word藉由KBs外部添加產生
- To-do
- 實作KPT
- 了解KBs如何生成label word
1/17
---
- process
- [slide](https://docs.google.com/presentation/d/1uj0kzs3-77QW0V9nVbds53kRFmgo0yCW/edit?usp=share_link&ouid=103103346235533747462&rtpof=true&sd=true)
- Disscussion
- 使用kklab dataset實作問題
- To-do
- 使用Knowledge Verbalizer外部添加label word
12/27
---
- process
- [slide](https://docs.google.com/presentation/d/1K34QF32ljJUYCjPfx6m_nUYUEohSl1F0/edit?usp=share_link&ouid=103103346235533747462&rtpof=true&sd=true)
- Disscussion
- 介紹prompt-based learning
- To-do
- prompt-based learning 實作
12/13
---
- process
- [slide](https://docs.google.com/presentation/d/1Oc-f5AIa0Za9RT13ZQ9QSR7ne47xAMzE/edit?usp=share_link&ouid=103103346235533747462&rtpof=true&sd=true)
- Disscussion
- 競賽進度資料前處理的部分
- To-do
- prompt-based learning基本了解、建立競賽模型
11/29
---
- progress
- [slide](https://docs.google.com/presentation/d/1aG3L8WkSU7VYeZGxLpq_vsNNzY2NZeo6/edit?usp=share_link&ouid=103103346235533747462&rtpof=true&sd=true)
- Disscussion
- 針對資料集介紹span-level model相關資訊
- To-do
- 使用span-level model 處理資料集
- 進度加快
- Paper link
- [TPLinker: Single-stage Joint Extraction of Entities and Relations
Through Token Pair Linking](https://arxiv.org/pdf/2010.13415.pdf)
- [Span-Level Model for Relation Extraction](https://aclanthology.org/P19-1525.pdf?fbclid=IwAR3dpK2SiybmCEEtodvdHL82srfts__yvr2LFLCmitRW7JYYUfPu5ptvEm8)
11/01
---
- progress
- [Slide](https://docs.google.com/presentation/d/11t8IjwsXBrekMXK34tfoLSJFdV_j2pFK/edit?usp=share_link&ouid=103103346235533747462&rtpof=true&sd=true)
- Disscussion
- 論文內容問題
- To-do
- 將模型結合競賽dataset
- Materia
- [競賽資訊](https://tbrain.trendmicro.com.tw/Competitions/Details/26)
10/18
---
- progress
- [Slide](https://docs.google.com/presentation/d/1nzcITCth-dhEyfcOQdo-ToqwwAckdTJK/edit?usp=sharing&ouid=103103346235533747462&rtpof=true&sd=true)
- Disscussion
- 討論論文內容
- To-do
- 論文內數學符號必須清楚理解、了解程式碼與概念實作
- Extra
- LDA
10/11
---
- Progress
- [Slide](https://docs.google.com/presentation/d/1nQ-QQdLmEsFWGifD4-3v_dKO6a81Htnj/edit?usp=sharing&ouid=103103346235533747462&rtpof=true&sd=true)
- Disscussion
- 確認專題方向並提供相關資料
- To-do
- 看完相關文章並確認最終專題競賽主題
- [Multi-hop Reading Comprehension through Question Decomposition and Rescoring (ACL 2019)](https://aclanthology.org/P19-1613.pdf )