###### tags: `one-offs` `monte carlo` `sampling` `gibbs sampling`
# Scans in Gibbs Sampling
**Overview**: In this note, I examine some aspects of the notion of a 'scan' in the context of Gibbs sampling MCMC algorithms.
## Discussion
In Gibbs sampling, a 'scan' is usually viewed as a sequence of updates which touches each coordinate exactly once. There is clear motivation for demarcation of segments in which each coordinate is touched at least once (e.g. ergodicity), but is the 'exactly' needed?
In fact, updating all of the coordinates in equal proportion need not be optimal. That is, in under an 'at least once each' scan protocol, it can be the case that certain coordinates 'should' be updated several times (though not in a row, of course). This is easiest to see when some of the coordinates are essentially independent of all other coordinates; roughly speaking, coordinates which are subject to greater dependence require greater effort to move substantially. This should be true for both deterministic and randomised schedules.
Note that one particularly useful way of thinking about randomised schedules is to make an analogy with a continuous-time jump process with Gibbs-type transitions, where each coordinate can have its own jump rate. This abstraction allows one to step away from the combinatorially-flavoured problem of schedule selection, to the more continuous and analytic problem of rate tuning.
One reason for not talking about this distinction so much is it is hard to find good heuristics for practically setting heterogeneous coordinate rates. Another reason is that for sufficiently symmetric models (e.g. Gibbs sampling as applied to spin systems), the homogeneous formulation is probably optimal, in some sense or another. It bears mentioning that there are few concrete examples of problems to which Gibbs sampling applies for which a sharp theoretical analysis is available, and many of these rely heavily on symmetry assumptions to make progress. This represents something of a barrier to understanding the heterogeneous case well.
A point of contrast is coordinate descent algorithms for minimisation problems, where inhomogeneous scans are commonplace, and even quite well-understood. If this analogy is explored well, perhaps there will be room for making useful recommendations about inhomogeneous scans for Gibbs sampling.
All of this is to say, in the absence of a well-justified alternative to balanced scans, it is easy to see why they have become the norm. Still, it's a bit interesting to re-think what a 'scan' ought to mean, and how the notion might be adapted and expanded.

Overview: In this note, I log some basic observations about diffusion-based generative models.

8/14/2023Overview: In this note, I describe some aspects of hierarchical structure in MCMC algorithms, and how they can be of theoretical and practical relevance.

8/9/2023Overview: In this note, I discuss a recurrent question which can be used to generate research questions about methods of all sorts. I then discuss a specific instance of how this question has proved fruitful in the theory of optimisation algorithms. Methods and Approximations A nice story is that when Brad Efron derived the bootstrap, it was done in service of the question “What is the jackknife an approximation to?”. I can't help but agree that there's something quite exciting about research questions which have this same character of ''What is (this existing thing) an approximation to?''. One bonus tilt on this which I appreciate is that there can be multiple levels of approximation, and hence many answers to the same question. One well-known example is gradient descent, which can be viewed as an approximation to the proximal point method, which can then itself be viewed as an approximation to a gradient flow. There are probably even more stops along the way here. In this case, there is even the perspective that from the perspective of mathematical theory, there may be at least as much to be gained by stopping off at the proximal point interpretation, as there is from the gradient flow perspective. My experience is that generalist applied mathematicians get to grips with the gradient flow quickly, but optimisation theorists can squeeze more out of the PPM formulation. There is thus some hint that using this 'intermediate' approximation can be particularly insightful in its own right. It would be interesting to collect more examples with this character.

5/22/2023Overview: In this note, I prove Hoeffding's inequality from the perspectives of martingales and convex ordering. The Basic Construction Let $-\infty<a<x<b<\infty$, and define a random variable $M$ with law $M\left(x;a,b\right)$ by \begin{align} M=\begin{cases} a & \text{w.p. }\frac{b-x}{b-a}\ b & \text{w.p. }\frac{x-a}{b-a}. \end{cases}

5/22/2023
Published on ** HackMD**