###### tags: `one-offs` `diffusions` `sampling` `generative models`
# Denoising-Centric Diffusions
**Overview**: In this note, I log some basic observations about diffusion-based generative models.
## Denoising and Legitimacy
Like many, developments in the last couple of years have motivated me to spend time thinking about diffusion-based generative models. One observation which helped me to feel much more comfortable with some aspects of these approaches was to remember that the main training task is to learn a denoiser. This is of course 'right there' in the title of one of the main papers in the area; my excuse is that I had been distracted by other things.
Anyways, the perspective is certainly not that fitting a denoiser to a { prior / data set } is trivial, but more that it is (to a lesser extent) a task in the same category as linear algebra, in the sense that people have been doing this for many years, and they basically know what they're doing (for some value of 'know what they're doing').
I guess that if you were interested in something like texture synthesis, you could basically procede by applying a { local / covariant } denoiser to some initial Gaussian noise. By using a local denoiser, you work with a function on a much lower-dimensional space, and this is plausibly is not as difficult, in terms of mitigating the curse of dimensionality, etc. Note that this is not necessarily what I think is happening in today's Diffusion Models. Separately, this connection reminds me of the summer I spent helping with a project on { RED / Plug and Play } methods for inverse problems, where I first encountered [this lovely paper](https://link.springer.com/article/10.1007/BF00375127).
Another neat observation in this area (of a different character) is that while the score can be expressed in terms of predicting the noiseless state, it can also be expressed in terms of predicting the intermediate states, e.g. under additive Brownian motion, you get that for $0 < s < t$,
\begin{align}
\nabla_{x_{t}}\log p_{t}\left(x_{t}\right)=\frac{\mathbf{E}\left[X_{s}\mid X_{t}=x_{t}\right]-x_{t}}{t-s},
\end{align}
and surely an analogous formula holds for more general noising processes and state spaces. In a sense, this is just a consequence of dynamic programming recursions for controlled stochastic processes.

Overview: In this note, I describe some aspects of hierarchical structure in MCMC algorithms, and how they can be of theoretical and practical relevance.

8/9/2023Overview: In this note, I discuss a recurrent question which can be used to generate research questions about methods of all sorts. I then discuss a specific instance of how this question has proved fruitful in the theory of optimisation algorithms. Methods and Approximations A nice story is that when Brad Efron derived the bootstrap, it was done in service of the question “What is the jackknife an approximation to?”. I can't help but agree that there's something quite exciting about research questions which have this same character of ''What is (this existing thing) an approximation to?''. One bonus tilt on this which I appreciate is that there can be multiple levels of approximation, and hence many answers to the same question. One well-known example is gradient descent, which can be viewed as an approximation to the proximal point method, which can then itself be viewed as an approximation to a gradient flow. There are probably even more stops along the way here. In this case, there is even the perspective that from the perspective of mathematical theory, there may be at least as much to be gained by stopping off at the proximal point interpretation, as there is from the gradient flow perspective. My experience is that generalist applied mathematicians get to grips with the gradient flow quickly, but optimisation theorists can squeeze more out of the PPM formulation. There is thus some hint that using this 'intermediate' approximation can be particularly insightful in its own right. It would be interesting to collect more examples with this character.

5/22/2023Overview: In this note, I prove Hoeffding's inequality from the perspectives of martingales and convex ordering. The Basic Construction Let $-\infty<a<x<b<\infty$, and define a random variable $M$ with law $M\left(x;a,b\right)$ by \begin{align} M=\begin{cases} a & \text{w.p. }\frac{b-x}{b-a}\ b & \text{w.p. }\frac{x-a}{b-a}. \end{cases}

5/22/2023Overview: In this note, I give a quick proof of a nice convexity result. From Moment-Generating Functions to Uniform Continuity Constants A well-known fact in the context of exponential family theory (and / or statistical mechanics) is that for suitable functions $F$, the cumulant-generating function \begin{align} A:\theta\mapsto\log\left(\int\mu\left(\mathrm{d}x\right)\cdot\exp\left(\left\langle \theta,F\left(x\right)\right\rangle \right)\right) \end{align} is convex, modulo a few standard conditions. Less well-known (but perhaps indirectly related?) is that the logarithm of the $C^{\alpha}$ seminorm of a function is also convex! That is, fix a function $F$ and define

5/22/2023
Published on ** HackMD**