# Data Mining (Team 5)
+ [Github repo](https://github.com/tangerine1202/DataMining-Team5)
+ [Progress 1 report](https://www.icloud.com/keynote/0d450KCHSvLz1fSuLVXTm1GnA#DM-progress-report1)
+ [AI Competition](https://tbrain.trendmicro.com.tw/Competitions/Details/26)
- next week
- QA
- [x] train-val-test split (0.7, 0.2, 0.1)
- [x] 嘗試其他 metrics: lcs, f1 (squad), exact_match (squad)
- [x] HF model can easily be wrapped into PyTorch model.
- [optional]
- 如何把 s 餵進
- ==每組 q' 跟 r' 是 indepentent 的嗎?還是有些 (q', r') 是要搭配在一起?==
- q', r' 很多組,要如何統一 / 是否需要統一 嗎?
- dataset 中有多少 unseen token(估計是沒有)
- next week
- Token Classification (洪偉豪)
- LCS (Result[https://docs.google.com/spreadsheets/d/1tJhIE_M-DWNVVNykEF0YOGspO6J80v6Fd4LBCmqKO5k/edit?usp=sharing])
- 把 predict_Q, predict_R 一些雜質(#, [CLS])濾乾淨
- 調參數(目前只用一個 epoch 因為還沒找到合適的 hp)
---
- **Token Classification (NER)**
- [Custom Named Entity Recognition with BERT.ipynb](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/BERT/Custom_Named_Entity_Recognition_with_BERT_only_first_wordpiece.ipynb)
- [huggingface tasks -- token classification](https://huggingface.co/docs/transformers/tasks/token_classification)Classification**
- [Opinion mining](https://paperswithcode.com/task/opinion-mining)
- [LIAR-PLUS](https://github.com/Tariq60/LIAR-PLUS)
- contain statement, justification (statement excluded)
- from [Where is Your Evidence: Improving Fact-checking by Justification Modeling](https://paperswithcode.com/paper/where-is-your-evidence-improving-fact)
- Argument Mining
- **Argument component Identification**
- Selecting relevant text in the general text which can be part of an argumentation. (i.e. argument segments vs. non-argument segments)
- methods
- structured (e.g. RST)
- unstructured
- approaches
1. sentence-level classification
2. sequence-level agument discoure units (ADUs, argument propositions)
- Argument componenet Classification
- Determine the type of argument proposition. (e.g. premise, claim, conclusion, etc.)
- Argument connection detection
- non, support, attack, etc.
- some datasets
- [AIF-DB](https://corpora.aifdb.org)
- [UKP](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/1997)
- [AMPERE++](https://zenodo.org/record/6362430#.Y2tt_i9CaAn) from [AURC](https://github.com/trtm/AURC)
- reference
- [Aruging with BERT](https://negedng.github.io/files/MSc_thesis.pdf)
- include famous dataset intro
- [Argument Mining: A Survey -- 6. Identifying Argument Components](https://direct.mit.edu/coli/article/45/4/765/93362/Argument-Mining-A-Survey)
- [Stance Detection](https://paperswithcode.com/task/stance-detection)
- The extraction of a subject's reaction to a claim made by a primary. For example
```
Source: "Apples are the most delicious fruit in existence"
Reply: "Obviously not, because that is a reuben from Katz's"
Stance: deny
```