# Connection between denoising and EBM
**Raphael Shu, 2020/3/5**
Setting: let the oracle sequence (may be discrete tokens or continuous variables) be $x^*$. Then let noisy datapoints be
$$\tilde x = x + \epsilon$$
where, $\epsilon$ is a random noise. In the discrete case, the noise can be injected through disruption process.
Consider a denoising model:
$$\min_S ||S(\tilde x) - x^*||_2$$
This loss can also be written as predicting the difference (they are equivalent):
$$\min_S ||S(\tilde x) - (x^* - \tilde x)||_2$$
Let $p(x)$ be a true probablisitc model that evaluates the true probability of seqeunce $x$. Then
$$ \nabla_{\tilde x} p(\tilde x) \approx x^* - \tilde x $$
This is the gradient at the point $\tilde x$. Now we can see the loss function turns to be
$$ \min_S ||S(\tilde x) - \nabla_{\tilde x} p(\tilde x)||_2 $$
Here, the output of $S(\cdot)$ is a set of vectors to match the gradient. This loss is also equivalent to the following form
$$ \min_E ||\nabla_{\tilde x} E(\tilde x) - \nabla_{\tilde x} p(\tilde x)||_2 $$
Where the compution happens in the gradient domain. $E$ is a energy function. This loss function is known as *score matching loss* in energy-based models.
To summarize, we show that training denosing model is equivalent to training a energy-based model with score matching. Updating a sequence with denosing is equivalent to updating with the gradient from EBM.