###### tags: `one-offs` `markov chains`
# Coupling Markov Chains to Avoid Each Other
**Overview**: In this note, I will discuss a simple result about a certain class of Markov chains on finite state spaces. The proof will be sketched briefly, and then I will cover some motivations and potential applications.
## The Result
Let $G = (V, E)$ be a finite graph, and let $P$ be the transition matrix of a $V$-valued Markov chain which has the uniform distribution on $V$ as its invariant measure. Now, consider running copies of Markov chains driven by the kernel $P$, initialised at all nodes $i \in V$. The chains can have some dependence amongst themselves, but each chain is marginally a standard $P$-driven Markov chain.
The result is that it is possible to couple all of these $|V|$ chains so that they *never* collide with one another, i.e. for all times $t \geqslant 0$, there is always exactly one particle occupying each vertex on the graph.
## The Proof
$P$ is a transition matrix corresponding to a Markov kernel, and so the entries of each row sum to $1$. Additionally, it has the uniform distribution as its invariant measure, and so the entries of each column also sum to $1$. This makes $P$ a doubly-stochastic matrix, and hence lies in the so-called [Birkhoff Polytope](https://en.wikipedia.org/wiki/Birkhoff_polytope), which is a convex polytope in the space of $|V| \times |V|$ matrices. The extreme points of this polytope are permutation matrices, and it thus follows that one can find a probability vector $\alpha \in \Delta^{\mathrm{Sym} (V) - 1}$ such that
\begin{align}
P = \sum_{\sigma \in \mathrm{Sym} (V)} \alpha_\sigma P_\sigma,
\end{align}
where $P_\sigma$ is the transition matrix corresponding to the deterministic dynamics $X_{t-1} = i \implies X_t = \sigma(i)$. Hence the kernel $P$ can be viewed as a mixture of deterministic kernels, and we are nearly done.
The construction thus follows:
- At $t = 0$, for $i \in V$, initialise $X_0^{(i)} = i$,,
- At each time $t > 0$,
* Sample $\sigma_t \sim \mathrm{Categorical} ( \alpha )$.
* For $i \in V$, set $X_t^{(i)} = \sigma_t ( X_{t-1}^{(i)})$.
By the mixture representation, it holds for each $i$ that the chain $X^{(i)}_\circ$ has the law of $P$ marginally. By the permutation representation, it is clear that any two chains will never collide, and the result follows.
## Motivation
Given my interests, it is perhaps unsurprising that my motivations for studying this construction stem from Monte Carlo methods, where one can imagine using such an approach for a type of 'uniform-in-time stratification' of the chains on path space.
However, the practical potential of this approach is not currently clear. If $|V|$ is of polynomial size, and one can write down $P$, then, there is a greedy algorithm which runs in polynomial time and returns a probability vector $\alpha$ such that the mixture representation holds. The $\alpha$ so obtained will only have polynomially-many nonzero entries. However, in practice, many interesting problems have much larger domains, and so it is not clear how to implement this program efficiently.
## Relaxations and Practicalities
A relaxation of this approach is to consider practical decompositions of $P$ into nearly-deterministic components. This will probably not come with the same theoretical elegance, but should be much more robustly implementable in general. In particular, this idea still makes sense for non-uniform invariant distributions, and on non-discrete state spaces.
For example, one can imagine feeding the same Brownian motion to several diffusions, the same momenta to (randomised) Hamiltonian dynamics simulations, and so on. The literature on [iterated random function systems](https://epubs.siam.org/doi/10.1137/S0036144598338446) is rife with similar potential.
Frustratingly, many natural approaches to this will actually encourage the separate chains to remain distinct, but move closely to one another. It will perhaps take more effort to keep them apart well, e.g. feeding *anticorrelated* Brownian motions to different particles. This has been explored [a little bit](https://www.ma.imperial.ac.uk/~pavl/NuskenPavliotis2018.pdf), and has links to [(Randomised) Quasi-Monte Carlo](http://www-labs.iro.umontreal.ca/~lecuyer/myftp/papers/mcqmc08-array.pdf), but has not yet seen widespread adoption.

Overview: In this note, I log some basic observations about diffusion-based generative models.

8/14/2023Overview: In this note, I describe some aspects of hierarchical structure in MCMC algorithms, and how they can be of theoretical and practical relevance.

8/9/2023Overview: In this note, I discuss a recurrent question which can be used to generate research questions about methods of all sorts. I then discuss a specific instance of how this question has proved fruitful in the theory of optimisation algorithms. Methods and Approximations A nice story is that when Brad Efron derived the bootstrap, it was done in service of the question “What is the jackknife an approximation to?”. I can't help but agree that there's something quite exciting about research questions which have this same character of ''What is (this existing thing) an approximation to?''. One bonus tilt on this which I appreciate is that there can be multiple levels of approximation, and hence many answers to the same question. One well-known example is gradient descent, which can be viewed as an approximation to the proximal point method, which can then itself be viewed as an approximation to a gradient flow. There are probably even more stops along the way here. In this case, there is even the perspective that from the perspective of mathematical theory, there may be at least as much to be gained by stopping off at the proximal point interpretation, as there is from the gradient flow perspective. My experience is that generalist applied mathematicians get to grips with the gradient flow quickly, but optimisation theorists can squeeze more out of the PPM formulation. There is thus some hint that using this 'intermediate' approximation can be particularly insightful in its own right. It would be interesting to collect more examples with this character.

5/22/2023Overview: In this note, I prove Hoeffding's inequality from the perspectives of martingales and convex ordering. The Basic Construction Let $-\infty<a<x<b<\infty$, and define a random variable $M$ with law $M\left(x;a,b\right)$ by \begin{align} M=\begin{cases} a & \text{w.p. }\frac{b-x}{b-a}\ b & \text{w.p. }\frac{x-a}{b-a}. \end{cases}

5/22/2023
Published on ** HackMD**