###### tags: `one-offs` `monte carlo` `markov chains`
# Some Quiet Assumptions in MCMC
**Overview**: In this note, I describe a few properties which are prevalent in the design and analysis of Markov Chain Monte Carlo (MCMC) algorithms, and discuss the extent to which these properties are necessary, in various regards.
## Common Properties of MCMC Algorithms
One way to motivate MCMC methods is to accept that exact simulation from a probability measure may be difficult, but that constructing a convergent sequence of approximations is feasible. This is perhaps a core principle of analysis. Towards the practical side of this endeavour, one must then narrow the search space of possible convergent sequences, so as to identify sequences which are efficiently implementable and rapidly convergent.
The usual restrictions which lead to MCMC are to construct sequences
1. which are based on iterative constructions,
2. which are near-memoryless (Markovian), and
3. which have homogeneous-in-time dynamics.
One often also imposes that the dynamics be invariant under time-reversal ('detailed balance').
## Do we need these restrictions?
So, which of these restrictions are 'essential'?
### Invariance and Time-Reversal Invariance
To begin with, invariance under time-reversal is certainly not essential: it is well-appreciated that this condition is primarily a tractable condition for ensuring that the dynamics leave the target invariant, i.e. that if we reach the target measure, then we stay there. One must also mention that when constructed appropriately, chains which do not satisfy this condition can have improved convergence properties.
From a human perspective, a key benefit of this property is that it can be verified by examining only local properties of the chain; note that leaving the target invariant is a priori a global property, whose verification requires the computation of integrals. It also furnishes the theorist with many useful tools.
(As a brief remark: to some extent, it can even be acceptable to relax the condition of leaving the target measure invariant, provided that certain other operations are tractable or affordable. I may provide a proper treatment of this scenario elsewhere.)
### The Markov Property
How about the Markov property? Well, this appears to be fairly essential - there are precious few examples of such algorithms which are not at least trying to be Markovian, or Markovian on some extended state space.
The greatest deviations from the Markov property are perhaps adaptive MCMC algorithms, which are often attempting to emulate an ideal MCMC algorithm which possesses additional information about the target measure.
On the other hand, finite-memory Markov chains can be cast as regular Markov chains on a modified state space, and so they tend not to cause many technical difficulties. There is a case that such approaches tend not to bring substantial gains (though to an extent, this depends on which chains one chooses to include in this category).
Being "fundamentally" non-Markovian (whatever this might mean) seems hard, and not obviously worthwhile. The intuition of repelling dynamics away from past states is popular in principle, but appearsto be hard to instantiate without modifying the target measure. The difficulty of 'repulsion' is also observed in interacting particle methods, where 'propagation of chaos' results often hold, implying that finite collections of particles will behave 'almost independently' in an appropriate limit, i.e. repulsion necessarily becomes a lower-order effect.
### Time-Homogeneity
This leads nicely on to the third restriction: time-homogeneity. Arguably, this restriction is not needed, though leads to a 'genuinely' different class of methods, where one gradually transports one probability measure to another. This wears many faces, and demands other tools. SMC samplers are a good example of this paradigm. Methodologies based on the construction of deterministic transport maps are another (and indeed, these two can be fused together). Conventional MCMC methods can often be used as 'engines' of such approaches to improve performance.
## Some Conclusions
In essence, then, one could conclude that
1. Invariance under time-reversal is inessential (though often theoretically convenient),
2. The Markov property is convenient, and seriously departing from it does not seem to help much (with the possible exception of adaptive-type schemes), and
3. Time-homogeneity is not essential, and much can be gained by departing from it (though extra work is often required).
There may be descriptions of certain algorithms which seem at odds with these claims; I suspect that many of the discrepancies will come down to interpretation.

Overview: In this note, I log some basic observations about diffusion-based generative models.

8/14/2023Overview: In this note, I describe some aspects of hierarchical structure in MCMC algorithms, and how they can be of theoretical and practical relevance.

8/9/2023Overview: In this note, I discuss a recurrent question which can be used to generate research questions about methods of all sorts. I then discuss a specific instance of how this question has proved fruitful in the theory of optimisation algorithms. Methods and Approximations A nice story is that when Brad Efron derived the bootstrap, it was done in service of the question “What is the jackknife an approximation to?”. I can't help but agree that there's something quite exciting about research questions which have this same character of ''What is (this existing thing) an approximation to?''. One bonus tilt on this which I appreciate is that there can be multiple levels of approximation, and hence many answers to the same question. One well-known example is gradient descent, which can be viewed as an approximation to the proximal point method, which can then itself be viewed as an approximation to a gradient flow. There are probably even more stops along the way here. In this case, there is even the perspective that from the perspective of mathematical theory, there may be at least as much to be gained by stopping off at the proximal point interpretation, as there is from the gradient flow perspective. My experience is that generalist applied mathematicians get to grips with the gradient flow quickly, but optimisation theorists can squeeze more out of the PPM formulation. There is thus some hint that using this 'intermediate' approximation can be particularly insightful in its own right. It would be interesting to collect more examples with this character.

5/22/2023Overview: In this note, I prove Hoeffding's inequality from the perspectives of martingales and convex ordering. The Basic Construction Let $-\infty<a<x<b<\infty$, and define a random variable $M$ with law $M\left(x;a,b\right)$ by \begin{align} M=\begin{cases} a & \text{w.p. }\frac{b-x}{b-a}\ b & \text{w.p. }\frac{x-a}{b-a}. \end{cases}

5/22/2023
Published on ** HackMD**