# ChrF: character n-gram F-score for automatic MT evaluation 論文導讀
### 先備知識:統計學 F-score
$F-score = \Large{(1+\beta^2)precision\times recall \over \beta^2precision +recall}$
$\beta$ 可以是arbitrary的parameter,default的F-score通常會將$\beta$ 設為1,亦即同時參照precision和recall:
$F1-score = 2\Large{precision\times recall \over precision +recall}$
Precsion:判斷為真,事實上為真的比例
$Recall = \large{TP \over TP+FP}\small=+P$
Recall:事實上為真,但被判斷為真的比例
$Recall = \large{TP \over TP+FN}$
Reference: [Wikipedia](https://en.wikipedia.org/wiki/F-score)
### ChrF score
$ChrF\beta = (1+\beta^2)\Large{chrP\times chrR \over \beta^2 \cdot chrP +chrR}$
- $chrP$
預測句子中的所有$n-gram$同時也在參照正確句子中的百分比。
- $chrR$
參照正確句子中的所有$n-gram$同時也在預測句子中的百分比。
**ChrF與BLEU最大的不同,在於ChrF考慮的是character level的N-gram而BLEU考慮word level的N-gram。**

論文提及word level最好的$n$ 是 $4$,考量過其他matrics如MTERATER使用$n = 10$,BEER 使用$n = 6$和經過驗證WMT12/13/14(Workshop on Machine Translation)的資料集中的句子中每個字的平均長度,他們得到$n = 6$與人工評分(human ranking)會有最好的相關性,因此採用$6-gram$。
**此外,經過實驗,將字間的空格納入n元(n-gram)考量並不會提升相關性,因此捨棄字間空格。**
### 舉例來說
**Reference Sentence: The cat is running there.
Hypothesis Sentence: A cat is running.**
如果$n=6$:
**Reference的所有6-gram:**
"Thecat", "hecati", "ecatis", "==catisr==", "==atisru==", "==tisrun==", "==isrunn==", "==srunni==", "==runnin==", "==unning==", "nningt", "ningth", "ingthe", "ngther","gthere"
**Hypothesis的所有6-gram:**
"Acatis", "==catisr==", "==atisru==", "==tisrun==", "==isrunn==", "==srunni==", "==runnin==", "==unning=="
有==螢光註記==的是有對應出現在Reference和Hypothesis的N-gram,因此可以計算:
$chrP = \Large{7 \over 8}$
$chrR = \Large{7 \over 15}$
設$\beta=1$
$ChrF1 = 2\times\Large{{7 \over 8}\times {7 \over 15} \over {7 \over 8}+{7 \over 15}}\small\approx60.87$ (%)
研究也表明,在$\beta=3$時,$ChrF$與Human Ranking及其它matrics會有最好的correlation,因此現在的$ChrF$多指$ChrF3$。
**ChrF sample python code:**
```python!
torchmetrics.CHRFScore(n_char_order=6, n_word_order=2, beta=2.0, lowercase=False, whitespace=False, return_sentence_level_score=False, **kwargs)
from torchmetrics import CHRFScore
preds = ['A cat is running']
target = ['The cat is running there']
chrf = CHRFScore()
chrf(preds, target)
```
Reference: [CHRF: character n-gram F-score for automatic MT evaluation(Popovic,2016)](https://aclanthology.org/W15-3049.pdf), [CHRF SCORE](https://torchmetrics.readthedocs.io/en/stable/text/chrf_score.html)