# Empirical ELO Calculations There are a few open problems at ZeroEntropy related to ELO. ELO is an [MLE](https://hackmd.io/7PwDRMtXS1CIutUel3XDAg)-based model for converting an observed $N$x$N$ weight matrix $W$ into a length-$N$ elo vector $\vec{e}$. First we'll do some setup. Let's say we have a corpus $\mathcal{C}$ of documents, and an array of documents $d_1, ..., d_N \in \mathcal{C}$ that we want to calculate ELO scores for. Our goal is to create an $N$x$N$ weight matrix $W$ such that $$W_{ij} = \mathbf{P}(d_i \text{ beats } d_j)$$ We do this by presuming that every document $d_i$ can be represented by an underlying "elo score" $\vec{e}_i$, such that $$\mathbf{P}(d_i \text{ beats } d_j) = \sigma(\vec{e}_i-\vec{e}_j)$$ Where $\sigma(x) = \frac{1}{1+e^{-x}}$ is the sigmoid function. Of course, we can't truly know $\mathbf{P}(d_i \text{ beats } d_j)$ without sampling. But, sampling comes with issues (no longer necessarily transitive, unknown stddev, etc.). So, we create the following mathematical model: ## Empirical ELO Let's say that $G = \{g_1, ..., g_{|G|}\}$ games have been played, where each game $g_x$ is represented by a 2-tuple of indices $(i, j)$, indicating that $i$ beat $j$ for that game. Let's define the empirical $\tilde{W}$ such that $$\tilde{W}_{ij} = |\{g \in G : g = (i,j)\}|$$ Then, let's say we had an initial proposed elo vector $\hat{\vec{e}}$. We can define the elo-induced matrix $W(\hat{\vec{e}})$ such that, $$W(\hat{\vec{e}})_{ij} = \sigma(\hat{\vec{e}}_i-\hat{\vec{e}}_j)$$ Then, we can calculate the probability of empirically observing $\tilde{W}$ under the presumption that $\hat{\vec{e}}$ is the underlying ELO, $$ \mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \prod_{i,j} \left[ W(\hat{\vec{e}})_{ij} \right]^{\tilde{W}_{ij}} = \prod_{i,j} \left[ \sigma(\hat{\vec{e}}_i - \hat{\vec{e}}_j) \right]^{\tilde{W}_{ij}} $$ Taking the natural logarithm, we obtain the log-likelihood: $$ \log \mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \sum_{i,j} \tilde{W}_{ij} \log \sigma(\hat{\vec{e}}_i - \hat{\vec{e}}_j) $$ Then, using MLE, we declare that the best-fit $\hat{\vec{e}}$ is such that the log-likelihood is maximized, $$ \hat{\vec{e}} = \underset{\hat{\vec{e}}}{\arg\max}\;\mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \underset{\hat{\vec{e}}}{\arg\max} \log \mathbf{P}(\tilde{W} \mid \hat{\vec{e}}) = \underset{\hat{\vec{e}}}{\arg\max} \sum_{i,j} \tilde{W}_{ij} \log \sigma(\hat{\vec{e}}_i - \hat{\vec{e}}_j) $$ Which can then be solved for with gradient descent, or [this optimized algorithm for ELO](https://hackmd.io/x3_EkXGKRdeq-rNHo_RpZA).