# Empirical ELO Calculations
There are a few open problems at ZeroEntropy related to ELO. ELO is an [MLE](https://hackmd.io/7PwDRMtXS1CIutUel3XDAg)-based model for converting an observed $N$x$N$ weight matrix $W$ into a length-$N$ elo vector $\vec{e}$.
First we'll do some setup.
Let's say we have a corpus $\mathcal{C}$ of documents, and an array of documents $d_1, ..., d_N \in \mathcal{C}$ that we want to calculate ELO scores for.
Our goal is to create an $N$x$N$ weight matrix $W$ such that
$$W_{ij} = \mathbf{P}(d_i \text{ beats } d_j)$$
We do this by presuming that every document $d_i$ can be represented by an underlying "elo score" $\vec{e}_i$, such that
$$\mathbf{P}(d_i \text{ beats } d_j) = \sigma(\vec{e}_i-\vec{e}_j)$$
Where $\sigma(x) = \frac{1}{1+e^{-x}}$ is the sigmoid function.
Of course, we can't truly know $\mathbf{P}(d_i \text{ beats } d_j)$ without sampling. But, sampling comes with issues (no longer necessarily transitive, unknown stddev, etc.). So, we create the following mathematical model:
## Empirical ELO
Let's say that $G = \{g_1, ..., g_{|G|}\}$ games have been played, where each game $g_x$ is represented by a 2-tuple of indices $(i, j)$, indicating that $i$ beat $j$ for that game. Let's define the empirical $\tilde{W}$ such that
$$\tilde{W}_{ij} = |\{g \in G : g = (i,j)\}|$$
Then, let's say we had an initial proposed elo vector $\hat{\vec{e}}$. We can define the elo-induced matrix $W(\hat{\vec{e}})$ such that,
$$W(\hat{\vec{e}})_{ij} = \sigma(\hat{\vec{e}}_i-\hat{\vec{e}}_j)$$
Then, we can calculate the probability of empirically observing $\tilde{W}$ under the presumption that $\hat{\vec{e}}$ is the underlying ELO,
$$
\mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \prod_{i,j} \left[ W(\hat{\vec{e}})_{ij} \right]^{\tilde{W}_{ij}} = \prod_{i,j} \left[ \sigma(\hat{\vec{e}}_i - \hat{\vec{e}}_j) \right]^{\tilde{W}_{ij}}
$$
Taking the natural logarithm, we obtain the log-likelihood:
$$
\log \mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \sum_{i,j} \tilde{W}_{ij} \log \sigma(\hat{\vec{e}}_i - \hat{\vec{e}}_j)
$$
Then, using MLE, we declare that the best-fit $\hat{\vec{e}}$ is such that the log-likelihood is maximized,
$$
\hat{\vec{e}} = \underset{\hat{\vec{e}}}{\arg\max}\;\mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \underset{\hat{\vec{e}}}{\arg\max} \log \mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \underset{\hat{\vec{e}}}{\arg\max} \sum_{i,j} \tilde{W}_{ij} \log \sigma(\hat{\vec{e}}_i - \hat{\vec{e}}_j)
$$
Which can then be solved for with gradient descent, or [this optimized algorithm for ELO](https://hackmd.io/x3_EkXGKRdeq-rNHo_RpZA).