Formal Analysis of Bilingual Dictionary Prediction Metrics

<h1> Formal Analysis of Bilingual Dictionary Prediction Metrics </h1> Let $l(e)$ be left vertex (word) in edge (translation) $e$. Let $r(e)$ be the right vertex. Let $v(S)$ be the set of vertices in a set $S$. *Notice that left and right for a set don't matter as the graph is bidirectional* *The set $Test$ represents Original data, or the data in existing Apertium Bilingual Dictionary* <br> <b> An equation to understand</b> $N($$e$ $\in$ $Test$$) +$ $N(($$e$ $\notin$ $Test$) $\cap$ ($l(e)$ $\in$ $v(Test$)) $\cap$ ($r(e)$ $\in$ $v(Test)$$)) +$ $N(l(e)$ $\in$ $v(Test$) $\cap$ $r(e)$ $\notin$ $v(Test)) +$ $N(r(e)$ $\in$ $v(Test)$ $\cap$ $l(e)$ $\notin$ $v(Test)) +$ $N(l(e)$ $\notin$ $v(Test)$ $\cap$ $r(e)$ $\notin$ $v(Test))$ $=|Pred|$. Notice that here, summand 3, 4 and 5 automatically imply e $\notin$ Test <br> <h3> Graph 1 </h3> The first graph contains 2 bars for each category: The blue bar ("Both-Vertex-Precision") shows precision when both words of the predicted translation $e$ are in the apertium bilingual dictionary for the language pair (Test). Formally: $\frac{N(e \in Test)}{N(e \in Test) + N((e \notin Test) \cap (l(e), r(e) \in v(Test)))}$ The red bar ("Both-Vertex-Recall") shows recall when both words of an original apertium translation $o$ are in the Input data used (10 other language-pairs) but have not been predicted as a translation. Formally: $\frac{N(o \in Pred)}{N(o \in Pred) + N((o \notin Pred) \cap (l(o), r(o) \in v(Input)))}$ <h3> Graph 2 </h3> The second graph contains 2 bars for each category, each bar split into 5 parts: The first bar compares Predicted translations to Original/Test (Apertium bidix) translations. The lowest part (in orange) shows percentage of predicted translations that were in original data. This is annotated "overall precision" which may be misleading as Apertium data can be incomplete (refer to graph 1 for realistic numbers). The extra translations not in original data are further classified into 4 categories based on Original data. *Note iPnOcO stands for In Predicted, Not in Original, Classified by Original*. Category iPnOcO 0: Neither of the words in the predicted translation are in Original data. Category iPnOcO 1: Language 1 (english in "en-es" pair) word of translation is in Original data. Category iPnOcO 2: Language 2 (spanish in "en-es" pair) word of translation is in Original data. Category iPnOcO 3: Both words of the predicted translation are in Original Data. Formally: Let $e$ be a predicted translation ($e \in Pred$) Overall Recall - $\frac{N(e \in Test)}{|Test|}$ Category iPnOcO 0 - $\frac{N((e \notin Test) \cap (l(e), r(e) \notin Test))}{|Test|}$ Category iPnOcO 1 - $\frac{N((e \notin Test) \cap (l(e) \in Test) \cap (r(e) \notin Test))}{|Test|}$ Category iPnOcO 2 - $\frac{N((e \notin Test) \cap (l(e) \notin Test) \cap (r(e) \in Test))}{|Test|}$ Category iPnOcO 3 - $\frac{N((e \notin Test) \cap (l(e), r(e) \in Test))}{|Test|}$ The second bar compares Original translations to Predicted translations. The lowest part (in yellow) shows percentage of original translations that were in the predicted data. This is annotated overall recall. The recall shown in Graph 1 is slightly different, in that it ignores translations which could not possibly have been inferred as either one or both words don't exist in Input data (other 10 language-pairs). The translations in Original that are not in Predicted are classified in 4 categories based on Input data. *Note iOnPcI stands for In Original, Not in Predicted, Classified by Input.* Category iOnPcI 0: Neither of the words in the original translation are in input data. Category iOnPcI 1: Language 1 (english in "en-es" pair) word of translation is in input data. Category iOnPcI 2: Language 2 (spanish in "en-es" pair) word of translation is in input data. Category iOnPcI 3: Both words of the predicted translation are in input data. *Intuition on numbering: Let '1' be $l(e) \in S$, '2' be $r(e) in \s$, if both are there then 1+2=3. If neither, 0.* Formally: Let $o$ be an original translation ($o \in Test$) Overall Precision - $\frac{N(o \in Pred)}{|Input|}$ Category iOnPcI 0 - $\frac{N((o \notin Pred) \cap (l(o), r(o) \notin Input))}{|Input|}$ Category iOnPcI 1 - $\frac{N((o \notin Pred) \cap (l(o) \in Input) \cap (r(o) \notin Input))}{|Input|}$ Category iOnPcI 2 - $\frac{N((o \notin Pred) \cap (l(o) \notin Input) \cap (r(o) \in Input))}{|Input|}$ Category iOnPcI 3 - $\frac{N((o \notin Pred) \cap (l(o), r(o) \in Input))}{|Input|}$