# Empirical ELO Calculations
There are a few open problems at ZeroEntropy related to ELO. ELO is an [MLE](https://hackmd.io/7PwDRMtXS1CIutUel3XDAg)-based model for converting an observed $N$x$N$ weight matrix $W$ into a length-$N$ elo vector $\vec{e}$.
First we'll do some setup.
Let's say we have a corpus $\mathcal{C}$ of documents, and an array of documents $d_1, ..., d_N \in \mathcal{C}$ that we want to calculate ELO scores for.
Our goal is to create an $N$x$N$ weight matrix $W$ such that
$$W_{ij} = \mathbf{P}(d_i \text{ beats } d_j)$$
We do this by modelling that every document $d_i$ can be represented by a latent "elo score" $\vec{e}_i$, such that
$$\mathbf{P}(d_i \text{ beats } d_j) = \sigma(\vec{e}_i-\vec{e}_j)$$
Where $\sigma(x) = \frac{1}{1+e^{-x}}$ is the sigmoid function. This modeling presumption takes is from $O(N^2)$ free parameters in $W$, to merely $O(N)$ free parameters in $\vec{e}$.
Of course, we can't truly know $\mathbf{P}(d_i \text{ beats } d_j)$ without sampling and using such samples to deduce the most likely elo vector. The process of doing so is described below:
## Empirical ELO
Let's say that $G = \{g_1, ..., g_{|G|}\}$ games have been played, where each game $g_x$ is represented by a 2-tuple of indices $(i, j)$, indicating that $i$ beat $j$ for that game. Let's define the empirical $\tilde{W}$ such that
$$\tilde{W}_{ij} = |\{g \in G : g = (i,j)\}|$$
Then, let's say we had an initial proposed elo vector $\hat{\vec{e}}$. Under this proposal, we have that
$$\mathbf{P}(d_i \text{ beats } d_j) = \sigma(\hat{\vec{e}}_i-\hat{\vec{e}}_j)$$
Then, we can calculate the probability of empirically observing $\tilde{W}$ under the presumption that $\hat{\vec{e}}$ is the underlying ELO,
$$
\mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \prod_{i,j} \left[ \mathbf{P}(d_i \text{ beats } d_j) \right]^{\tilde{W}_{ij}} = \prod_{i,j} \left[ \sigma(\hat{\vec{e}}_i - \hat{\vec{e}}_j) \right]^{\tilde{W}_{ij}}
$$
Taking the natural logarithm, we obtain the log-likelihood of the observed results:
$$
\log \mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \sum_{i,j} \tilde{W}_{ij} \log \sigma(\hat{\vec{e}}_i - \hat{\vec{e}}_j)
$$
Then, using MLE, we declare that the best-fit $\hat{\vec{e}}$ is such that the log-likelihood is maximized,
$$
\hat{\vec{e}}_\text{best} = \underset{\hat{\vec{e}}}{\arg\max}\;\mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \underset{\hat{\vec{e}}}{\arg\max} \log \mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \underset{\hat{\vec{e}}}{\arg\max} \sum_{i,j} \tilde{W}_{ij} \log \sigma(\hat{\vec{e}}_i - \hat{\vec{e}}_j)
$$
Which can then be solved for with gradient descent, or [this optimized algorithm for ELO](https://hackmd.io/x3_EkXGKRdeq-rNHo_RpZA).