Denoising Diffusions

###### tags: `one-offs` `diffusions` `sampling` `generative models` # Denoising-Centric Diffusions **Overview**: In this note, I log some basic observations about diffusion-based generative models. ## Denoising and Legitimacy Like many, developments in the last couple of years have motivated me to spend time thinking about diffusion-based generative models. One observation which helped me to feel much more comfortable with some aspects of these approaches was to remember that the main training task is to learn a denoiser. This is of course 'right there' in the title of one of the main papers in the area; my excuse is that I had been distracted by other things. Anyways, the perspective is certainly not that fitting a denoiser to a { prior / data set } is trivial, but more that it is (to a lesser extent) a task in the same category as linear algebra, in the sense that people have been doing this for many years, and they basically know what they're doing (for some value of 'know what they're doing'). I guess that if you were interested in something like texture synthesis, you could basically procede by applying a { local / covariant } denoiser to some initial Gaussian noise. By using a local denoiser, you work with a function on a much lower-dimensional space, and this is plausibly is not as difficult, in terms of mitigating the curse of dimensionality, etc. Note that this is not necessarily what I think is happening in today's Diffusion Models. Separately, this connection reminds me of the summer I spent helping with a project on { RED / Plug and Play } methods for inverse problems, where I first encountered [this lovely paper](https://link.springer.com/article/10.1007/BF00375127). Another neat observation in this area (of a different character) is that while the score can be expressed in terms of predicting the noiseless state, it can also be expressed in terms of predicting the intermediate states, e.g. under additive Brownian motion, you get that for $0 < s < t$, \begin{align} \nabla_{x_{t}}\log p_{t}\left(x_{t}\right)=\frac{\mathbf{E}\left[X_{s}\mid X_{t}=x_{t}\right]-x_{t}}{t-s}, \end{align} and surely an analogous formula holds for more general noising processes and state spaces. In a sense, this is just a consequence of dynamic programming recursions for controlled stochastic processes.

Read more

Nested Structure in MCMC Algorithms

The "Approximation to What?" Principle

Hoeffding's Inequality by Convex Ordering

The Advent of Optimisation in Statistics