Coupling Markov Chains to Avoid Each Other

###### tags: `one-offs` `markov chains` # Coupling Markov Chains to Avoid Each Other **Overview**: In this note, I will discuss a simple result about a certain class of Markov chains on finite state spaces. The proof will be sketched briefly, and then I will cover some motivations and potential applications. ## The Result Let $G = (V, E)$ be a finite graph, and let $P$ be the transition matrix of a $V$-valued Markov chain which has the uniform distribution on $V$ as its invariant measure. Now, consider running copies of Markov chains driven by the kernel $P$, initialised at all nodes $i \in V$. The chains can have some dependence amongst themselves, but each chain is marginally a standard $P$-driven Markov chain. The result is that it is possible to couple all of these $|V|$ chains so that they *never* collide with one another, i.e. for all times $t \geqslant 0$, there is always exactly one particle occupying each vertex on the graph. ## The Proof $P$ is a transition matrix corresponding to a Markov kernel, and so the entries of each row sum to $1$. Additionally, it has the uniform distribution as its invariant measure, and so the entries of each column also sum to $1$. This makes $P$ a doubly-stochastic matrix, and hence lies in the so-called [Birkhoff Polytope](https://en.wikipedia.org/wiki/Birkhoff_polytope), which is a convex polytope in the space of $|V| \times |V|$ matrices. The extreme points of this polytope are permutation matrices, and it thus follows that one can find a probability vector $\alpha \in \Delta^{\mathrm{Sym} (V) - 1}$ such that \begin{align} P = \sum_{\sigma \in \mathrm{Sym} (V)} \alpha_\sigma P_\sigma, \end{align} where $P_\sigma$ is the transition matrix corresponding to the deterministic dynamics $X_{t-1} = i \implies X_t = \sigma(i)$. Hence the kernel $P$ can be viewed as a mixture of deterministic kernels, and we are nearly done. The construction thus follows: - At $t = 0$, for $i \in V$, initialise $X_0^{(i)} = i$,, - At each time $t > 0$, * Sample $\sigma_t \sim \mathrm{Categorical} ( \alpha )$. * For $i \in V$, set $X_t^{(i)} = \sigma_t ( X_{t-1}^{(i)})$. By the mixture representation, it holds for each $i$ that the chain $X^{(i)}_\circ$ has the law of $P$ marginally. By the permutation representation, it is clear that any two chains will never collide, and the result follows. ## Motivation Given my interests, it is perhaps unsurprising that my motivations for studying this construction stem from Monte Carlo methods, where one can imagine using such an approach for a type of 'uniform-in-time stratification' of the chains on path space. However, the practical potential of this approach is not currently clear. If $|V|$ is of polynomial size, and one can write down $P$, then, there is a greedy algorithm which runs in polynomial time and returns a probability vector $\alpha$ such that the mixture representation holds. The $\alpha$ so obtained will only have polynomially-many nonzero entries. However, in practice, many interesting problems have much larger domains, and so it is not clear how to implement this program efficiently. ## Relaxations and Practicalities A relaxation of this approach is to consider practical decompositions of $P$ into nearly-deterministic components. This will probably not come with the same theoretical elegance, but should be much more robustly implementable in general. In particular, this idea still makes sense for non-uniform invariant distributions, and on non-discrete state spaces. For example, one can imagine feeding the same Brownian motion to several diffusions, the same momenta to (randomised) Hamiltonian dynamics simulations, and so on. The literature on [iterated random function systems](https://epubs.siam.org/doi/10.1137/S0036144598338446) is rife with similar potential. Frustratingly, many natural approaches to this will actually encourage the separate chains to remain distinct, but move closely to one another. It will perhaps take more effort to keep them apart well, e.g. feeding *anticorrelated* Brownian motions to different particles. This has been explored [a little bit](https://www.ma.imperial.ac.uk/~pavl/NuskenPavliotis2018.pdf), and has links to [(Randomised) Quasi-Monte Carlo](http://www-labs.iro.umontreal.ca/~lecuyer/myftp/papers/mcqmc08-array.pdf), but has not yet seen widespread adoption.