## Feverous: Fact extraction and verification over structured and unstructured information
<div style="text-align: right">
<!-- Assignment 1 -->
<!-- Team Members: -->
<br> Keshav Bansal: 2019101019
<br> Sarthak Agrawal: 2019115003
</div>
# Problem:
Basic task of our project is to find misinformation. Existing methods have focused only on unstructured information which is textual information and ignored the wealth of information which is present in structured information, i.e. tables. Through this paper we introduce a novel dataset which we call FEVEROUS.
Each claim has an associated evidence which can either be in the form of cells of tables or sentences. Every claim also has an associated label attached with it which tells if the evidence supports, refutes or does not provide enough information about the claim. We would also try to create a baseline which predicts the label using claim and evidence and try to decrease the bias of the dataset so that we can predict the label without using evidence.
# Scope:
a NLP framework for automated fact-checking consisting of three stages: (i) claim detection to identify claims that need to be verified; (ii) evidence retrieval to find sources that support or contradict the claim; and (iii) claim verification to judge the veracity of the claim based on the retrieved evidence. Claim detection is frequently handled separately from evidence retrieval, whereas factual verification refers to the process of combining the two. Claim verification can be broken down into two separate tasks that can be completed separately or together: verdict prediction, in which labels for the veracity of claims are assigned, and justification production, in which verdict justifications are required.
FEVEROUS introduces a RoBERTa-based baseline for evidence retrieval and verdict prediction, showing the difficulty of the task of evidence retrieval and the relative simplicity of the later task.

# Datasets:
## Feverous
A claim may require a single table cell, a single sentence, or a combination of multiple sentences and cells from different articles as evidence for verification. FEVEROUS contains 87,026 claims, manually constructed and verified by trained annotators
## Fever:
A large-scale dataset of 185,445 claims constructed by annotators based on Wikipedia articles
## TabFact:
- TabFact contains artificial claims to be verified on the basis of wikipedia tables.
## InfoTABS
- InfoTABS contains claims to be verified on the basis of infoboxes.
## SemTabFacts:
SEM-TABFACTS requires verification on the basis of tables from scientific articles

### Some more datasets:
- Datasets
- https://github.com/Cartus/Automated-Fact-Checking-Resources#artificial-claims
- Misinfo Reaction Frames: Reasoning about Readers’ Reactions to News Headlines (Gabriel et al., 2022) [Paper] [Dataset] ACL 2022
- DialFact: A Benchmark for Fact-Checking in Dialogue (Gupta et al., 2022) [Paper] [Dataset] ACL 2022
- FAVIQ: FAct Verification from Information-seeking Questions (Park et al., 2022) [Paper] [Dataset] ACL 2022
- FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information (Aly et al., 2021)
[Paper] [Dataset] [Code] NeurIPS 2021
- InfoSurgeon: Cross-Media Fine-grained Information Consistency Checking for Fake News Detection (Fung et al., 2021) [Paper] [Dataset] ACL 2021
- Statement Verification and Evidence Finding with Tables (SEM-TAB-FACT) (Wang et al., 2021) [Dataset]
- Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence (Schuster et al., 2021) [Paper] [Dataset] NAACL 2021
- DanFEVER: claim verification dataset for Danish (Nørregaard and Derczynski, 2021) [Paper] [Dataset]] NoDaLiDa 2021
## Metrics for evaluation
The evaluation considers the correct prediction of the verdict as well as the correct retrieval of evidence

with $\hat{y}$ and $\hat{E}$ being the predicted label and evidence, respectively, and E the collection of gold evidence sets. Thus, a prediction is scored 1 iff at least one complete evidence set E is a subset of $\hat{E}$ and the predicted label is correct, else 0.
- We will also measure the PMI throughout the entire annotation process
- We want to measure co-occurrence between the verdict and the words in the claim, indicating that no claim-only bias is present in the dataset
- Finally, we will also measure the accuracy
## Expected Findings:
Table shows the results of our full baseline compared to a sentence-only and a table-only baseline.All baselines use our TF-IDF retriever with the sentence-only and table-only baseline extracting sentences and tables only, respectively. While the sentence-only model predicts the verdict label using only extracted sentences, the the table-only baseline only extracts the cells from retrieved tables with our cell extractor model and predicts the verdict by linearising the selected cells and their context. All models use our verdict predictor for classification. Our baseline that combines both tables and sentences achieves substantially higher sores than when focusing exclusively on either sentences or
tables.

## Baseline:
- We will train a Claim Only baseline which uses the claim as input and predicts the verdict label. We opted to fine-tune a pre-trained BERT model with a linear layer on top and will measure its accuracy using 5-fold cross-validation.
- Finally, we will train a claim-only evidence type model to predict whether a claim requires as evidence sentences, cells, or a combination of both evidence types. The model and experimental setup will be identical to the one used to assess claim-only bias.
### Retriever:
- Our baseline retriever module is a combination of entity matching and TF-IDF using DrQA. Combining both has previously been shown to work well, particularly for
retrieving tables .
- We first extract the top k pages by matching extracted entities from the claim with Wikipedia articles. If less than k pages have been identified this way, the remaining pages are selected by Tf-IDF matching between the introductory sentence of an article and the claim. The top l sentences and q tables of the selected pages are then scored separately using TF-IDF. We set k = 5, l = 5 and q = 3.
## Literature Review
- https://github.com/Cartus/Automated-Fact-Checking-Resources#artificial-claims
Misinfo Reaction Frames: Reasoning about Readers’ Reactions to News Headlines (Gabriel et al., 2022) [Paper] [Dataset] ACL 2022
- DialFact: A Benchmark for Fact-Checking in Dialogue (Gupta et al., 2022) [Paper] [Dataset] ACL 2022
- FAVIQ: FAct Verification from Information-seeking Questions (Park et al., 2022) [Paper] [Dataset] ACL 2022
- FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information (Aly et al., 2021)
[Paper] [Dataset] [Code] NeurIPS 2021
- InfoSurgeon: Cross-Media Fine-grained Information Consistency Checking for Fake News Detection (Fung et al., 2021) [Paper] [Dataset] ACL 2021
- Statement Verification and Evidence Finding with Tables (SEM-TAB-FACT) (Wang et al., 2021) [Dataset]
- Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence (Schuster et al., 2021) [Paper] [Dataset] NAACL 2021
- ParsFEVER: a Dataset for Farsi Fact Extraction and Verification (Zarharan et al., 2021) [Paper] [Dataset]
- DanFEVER: claim verification dataset for Danish (Nørregaard and Derczynski, 2021) [Paper] [Dataset]] NoDaLiDa 2021
## Timeline
| Date | Task |
| :---: | :---: |
| 12th October | Finding relevant github codes |
| 15 th October | Literature review complete |
| 18th October | Experiments to compare baseline with other models |
| 22nd October | Interim Submission |
| 30th october | FInalising the code |
| 5th November | Running experiments and testing |
| 16th November | Final Submission |