###### tags: `one-offs` `markov chains` `expository` `monte carlo` `sampling`
# Nested Structure in MCMC Algorithms
**Overview**: In this note, I describe some aspects of hierarchical structure in MCMC algorithms, and how they can be of theoretical and practical relevance.
## The Spectral Telescope and Convergence Theory
I recently enjoyed reading a paper which presented the idea of the 'spectral telescope' for the $\mathrm{L}^{2}$ convergence analysis of Markov chains. The construction applies to Markov chains $P$ which can be embedded into a sequence $\left\{ P_{k}\right\} _{0\leqslant k\leqslant K}$ of Markov chains (not necessarily practically implementable!) which have certain properties. In particular, assume that:
1. All of the $\left\{ P_{k}\right\}$ admit the same invariant measure.
2. $P_{0}=P$, and $P_{K}$ corresponds to independent sampling from the invariant measure.
3. For $0<k\leqslant K$, $P_{k-1}$ can be viewed as an approximation to $P_{k}$.
In practice, some more refined and specific conditions are needed, but this is close to the essence. The idea is then roughly that if for each $k$, one iteration of $P_{k-1}$ is 'as good as' $\varepsilon_{k}$ iterations of $P_{k}$, then it follows that $P = P_{0}$ is 'as good as' $\varepsilon_{1:K} := \varepsilon_{1} \cdot \cdots \cdot \varepsilon_{K}$ iterations of $P_{K}$ (this is the 'telescope' alluded to in the title), and so $P$ should decorrelate in roughly $\varepsilon_{1:K}^{-1}$ steps. These statements can be made rigorous without too much pain, and this elegance is one of the great triumphs of the paper.
In terms of applications, the paper shows that this precise structure is exhibited by the random-scan Gibbs sampler, for which the interpolating kernels correspond to random-scan *block* Gibbs samplers of varying size. That is, using blocks of size $(k - 1)$ is a randomised approximation to using blocks of size $k$, and so on.
One major curiosity of mine going forward is whether a similar nested / recursive structure is readily available in other MCMC algorithms of interest. Some cursory guesses:
1. The Multigrid Monte Carlo algorithm of Goodman-Sokal (discussed [here](https://hackmd.io/@sp-monte-carlo/SyTOBh9gt)) has an explicit recursive structure, whilst still making global moves.
2. The Multilevel Delayed Acceptance algorithms pioneered by Christen-Fox, refined by Teckentrup et al. and subsequently by Dodwell et al. also have a recursive structure, which behaves slightly differently to the MGMC algorithm (more Monte Carlo-themed than Multigrid-themed, let's say).
I'm sure that there will be other examples as well, and I look forward to seeing them understood better and better.

Overview: In this note, I log some basic observations about diffusion-based generative models.

8/14/2023Overview: In this note, I discuss a recurrent question which can be used to generate research questions about methods of all sorts. I then discuss a specific instance of how this question has proved fruitful in the theory of optimisation algorithms. Methods and Approximations A nice story is that when Brad Efron derived the bootstrap, it was done in service of the question “What is the jackknife an approximation to?”. I can't help but agree that there's something quite exciting about research questions which have this same character of ''What is (this existing thing) an approximation to?''. One bonus tilt on this which I appreciate is that there can be multiple levels of approximation, and hence many answers to the same question. One well-known example is gradient descent, which can be viewed as an approximation to the proximal point method, which can then itself be viewed as an approximation to a gradient flow. There are probably even more stops along the way here. In this case, there is even the perspective that from the perspective of mathematical theory, there may be at least as much to be gained by stopping off at the proximal point interpretation, as there is from the gradient flow perspective. My experience is that generalist applied mathematicians get to grips with the gradient flow quickly, but optimisation theorists can squeeze more out of the PPM formulation. There is thus some hint that using this 'intermediate' approximation can be particularly insightful in its own right. It would be interesting to collect more examples with this character.

5/22/2023Overview: In this note, I prove Hoeffding's inequality from the perspectives of martingales and convex ordering. The Basic Construction Let $-\infty<a<x<b<\infty$, and define a random variable $M$ with law $M\left(x;a,b\right)$ by \begin{align} M=\begin{cases} a & \text{w.p. }\frac{b-x}{b-a}\ b & \text{w.p. }\frac{x-a}{b-a}. \end{cases}

5/22/2023Overview: In this note, I give a quick proof of a nice convexity result. From Moment-Generating Functions to Uniform Continuity Constants A well-known fact in the context of exponential family theory (and / or statistical mechanics) is that for suitable functions $F$, the cumulant-generating function \begin{align} A:\theta\mapsto\log\left(\int\mu\left(\mathrm{d}x\right)\cdot\exp\left(\left\langle \theta,F\left(x\right)\right\rangle \right)\right) \end{align} is convex, modulo a few standard conditions. Less well-known (but perhaps indirectly related?) is that the logarithm of the $C^{\alpha}$ seminorm of a function is also convex! That is, fix a function $F$ and define

5/22/2023
Published on ** HackMD**

or

By clicking below, you agree to our terms of service.

Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet

Wallet
(
)

Connect another wallet
New to HackMD? Sign up