On the Benefits of Marginalisation for Langevin Diffusions

###### tags: `one-offs` `diffusions` `sampling` # On the Benefits of Marginalisation for Langevin Diffusions **Overview**: In this note, I compare the convergence behaviour of Langevin diffusions as applied to joint and marginal simulation. I give a quick proof that when possible, simulating the marginal process leads to convergence which is at least as rapid as that of the joint process. ## Prelude and Problem Setting A basic thought is that at different times, there have been innovations in Monte Carlo which are centered around both "add in extra variables to make sampling easier" and "marginalise out variables to make sampling easier". Of course, there is no contradiction per se, since "easier" means different things in each case. A small related exercise: fix a nice density $p\left(x,y\right)$, with $\left(x,y\right)$ of arbitrary dimension (and not necessarily equal). Consider using the standard overdamped Langevin diffusion to sample from 1. the joint distribution $p\left(x,y\right)$, and 2. the marginal distribution $p\left(x\right)$. Prove that for an appropriate measure of convergence, the marginal sampler will converge at least as fast as the joint sampler. ## Solution Since any overdamped Langevin diffusion is reversible with respect to its invariant measure, its $L^{2}$ convergence rate can be written in terms of the Rayleigh quotient as \begin{align} \gamma_{\mathrm{Joint}} &:=\inf\left\{ \frac{\int p\left(x,y\right)\cdot\left|\nabla_{x,y}f\left(x,y\right)\right|^{2}\,\mathrm{d}x\,\mathrm{d}y}{\int p\left(x,y\right)\cdot\left|f\left(x,y\right)\right|^{2}\,\mathrm{d}x\,\mathrm{d}y}:p\left(f\right)=0,p\left(f^{2}\right)<\infty\right\} \\ \gamma_{\mathrm{Marginal}} &:=\inf\left\{ \frac{\int p\left(x\right)\cdot\left|\nabla_{x}f\left(x\right)\right|^{2}\,\mathrm{d}x}{\int p\left(x\right)\cdot\left|f\left(x\right)\right|^{2}\,\mathrm{d}x}:p\left(f\right)=0,p\left(f^{2}\right)<\infty\right\} . \end{align} Consider the infimum which defines $\gamma_{\mathrm{Joint}}$, but taken over functions $f$ which depend only on $x$. Since this infimum is taken over a smaller set, it will be greater. It thus follows that \begin{align} \gamma_{\mathrm{Joint}} &\leqslant \inf\left\{ \frac{\int p\left(x,y\right)\cdot\left|\nabla_{x}f\left(x\right)\right|^{2}\,\mathrm{d}x\,\mathrm{d}y}{\int p\left(x,y\right)\cdot\left|f\left(x\right)\right|^{2}\,\mathrm{d}x\,\mathrm{d}y}:p\left(f\right)=0,p\left(f^{2}\right)<\infty\right\} \\ &=\inf\left\{ \frac{\int p\left(x\right)\cdot\left|\nabla_{x}f\left(x\right)\right|^{2}\,\mathrm{d}x}{\int p\left(x\right)\cdot\left|f\left(x\right)\right|^{2}\,\mathrm{d}x}:p\left(f\right)=0,p\left(f^{2}\right)<\infty\right\} \\ &=\gamma_{\mathrm{Marginal}}. \end{align} The spectral gap of the marginal process is thus lower-bounded by that of the joint process, establishing that the marginal sampler converges at least as fast as the joint sampler.