###### tags: `chasing constants` `mcmc` `monte carlo` `expository` `theory`
# Chasing Constants: Contractivity in a Mixed Distance
**Overview**: In this note, I will describe a strategy for establishing contractivity estimates for Markov chains with 'mixed' contraction styles. As per usual, the focus will be on understanding the proof strategy, and putting some effort into optimising the constants obtained.
## Markov Chains with Mixed Contraction Styles
In most simple examples of Markov chain contractivity, contraction tends to occur in one way at a time. For processes evolving in continuous space and time, it is common to show that one can couple two copies of the process to move closer to one another spatially, i.e. transport-type contractivity. In discrete settings, it is sometimes more common to exhibit couplings under which the coupled processes can collide exactly, which has more of a total variation (TV) flavour to it.
Often, it is difficult to establish these conditions globally. Typically, transport contraction is relatively easier to show when the processes are far away from one another, and TV contraction is relatively easier once the processes have gotten quite close already. It is thus natural to seek contractivity results which are able to blend these two phases together.
## Some Assumptions
In this note, we work with a Markov chain taking values in a metric space $\left( \mathcal{X}, d \right)$, which for now, we assume to have finite diameter $D$. Our key assumption comes in two parts: we assume the existence of a 'critical radius' $r$ such that
- For some $\alpha\in\left(0,1\right)$, whenever $d\left(x,y\right)\leqslant r$, it holds that $\mathrm{TV} \left(\delta_{x} P,\delta_{y} P \right) \leqslant 1-\alpha$.
- For some $\kappa\in\left(0,1\right)$, whenever $d\left(x,y\right)>r$, it holds that $W_{d}\left(\delta_{x}P,\delta_{y}P\right) \leqslant \left(1-\kappa\right)\cdot d \left(x,y\right)$.
This assumption formalises the earlier intuition, i.e. once two chains get close, it is easy to make them meet exactly, and if they are much further away than that, then they can be made to move spatially closer to one another, on average.
## Constructing a Hybrid Metric
To harness this assumption, we will introduce a new metric which linearly combines the structure of the ambient metric with the discrete metric. In particular, for $\beta \in \mathbf{R}$, we define the metric
\begin{align}
d_{\beta}\left(x,y\right)=d\left(x,y\right)+\beta\cdot\mathbf{1}\left[x\neq y\right].
\end{align}
Considering the evolution of $d_\beta$ along a pair of coupled trajectories, one sees that taking $\beta \approx 0$ will have good contractivity when the chains are far apart, whereas taking $\beta \to + \infty$ will have good contractivity once the chains are sufficiently close to have a chance of meeting exactly. We will thus have to trade off these regimes judiciously in order to obtain an optimal estimate.
## Contractivity Estimates
As in previous examples with this structure, the proof of contractivity breaks up neatly into two components, which we then combine at the end.
### Close Coupling
Assuming that $d\left(x,y\right)\leqslant r$, one can construct a coupling of the kernels such that
\begin{align}
\mathbf{E}\left[d_{\beta}\left(x^{'},y^{'}\right)\vert x,y\right]&\leqslant D+\beta\cdot\left(1-\alpha\right).
\end{align}
One can see this by examining our 'close coupling' assumption, which means that with probability $\geqslant \alpha$, we can make the chains meet exactly. Otherwise, their modified metric will pay a price of $\leqslant D$, for being at some distance to one another, as well as an additional price of $\beta$, for not having met exactly.
One can thus compute that
\begin{align}
&\quad\sup\left\{ \frac{W_{d_{\beta}}\left(\delta_{x}P,\delta_{y}P\right)}{d_{\beta}\left(x,y\right)}:x,y\in\mathcal{X},x\neq y,d\left(x,y\right)\leqslant r\right\} \\
&\leqslant\sup\left\{ \frac{D+\beta\cdot\left(1-\alpha\right)}{\beta}:x,y\in\mathcal{X},x\neq y,d\left(x,y\right)\leqslant r\right\} \\
&=D\cdot\beta^{-1}+\left(1-\alpha\right),
\end{align}
so the curvature of the Markov chain with respect to the modified metric is bounded above by $D\cdot\beta^{-1}+\left(1-\alpha\right)$ for close pairs of points $\left( x, y \right)$.
### Distant Drift
If, on the other hand, it holds that $d\left(x,y\right)>r$, then one easily obtains the bound
\begin{align}
\mathbf{E}\left[d_{\beta}\left(x^{'},y^{'}\right)\vert x,y\right]&\leqslant\left(1-\kappa\right)\cdot d\left(x,y\right)+\beta.
\end{align}
This is feasible, as the assumption allows us to construct a coupling such that $\mathbf{E}\left[d \left(x^{'},y^{'}\right)\vert x,y\right] \leqslant \left(1-\kappa\right)\cdot d\left(x,y\right)$, and we simply assume the worst about coalescence, bounding $\mathbf{1}\left[x\neq y\right] \leqslant 1$.
Proceeding as before, we compute that
\begin{align}
&\quad\sup\left\{ \frac{W_{d_{\beta}}\left(\delta_{x}P,\delta_{y}P\right)}{d_{\beta}\left(x,y\right)}:x,y\in\mathcal{X},x\neq y,d\left(x,y\right)>r\right\} \\
&\leqslant\sup\left\{ \frac{\left(1-\kappa\right)\cdot d\left(x,y\right)+\beta}{d\left(x,y\right)+\beta}:x,y\in\mathcal{X},x\neq y,d\left(x,y\right)>r\right\} \\
&=\frac{\left(1-\kappa\right)\cdot r+\beta}{r+\beta},
\end{align}
giving us a curvature bound for distant pairs $\left( x, y \right)$.
## Optimising the Contractivity
Combining the two estimates, one sees that for arbitrary $x, y \in \mathcal{X}$, and $\beta > 0$, one can write
\begin{align}
W_{d_{\beta}} \left(\delta_{x}P,\delta_{y}P\right) \leqslant \lambda_* \left( \beta \right)
\end{align}
with
\begin{align}
\lambda_* \left( \beta \right) = \max \left\{ D\cdot\beta^{-1}+\left(1-\alpha\right), \frac{\left(1-\kappa\right)\cdot r+\beta}{r+\beta} \right\}.
\end{align}
As anticipitated, the first term decays with $\beta$, while the second grows. We will optimise this bound by examining where the two curves cross, minimising their maximum.
### Some Reparametrisation
Introduce the variables $u$ and $\psi$ as
\begin{align}
r &= D\cdot\psi^{-1} \\
\beta &= D\cdot u^{-1}
\end{align}
so that
\begin{align}
\tilde{\lambda}_* \left( u \right) =\max\left\{ u+\left(1-\alpha\right),\frac{\left(1-\kappa\right)\cdot u+\psi}{u+\psi}\right\}.
\end{align}
Note that this corresponds to working with the metric
\begin{align}
d^{u}\left(x,y\right)=u\cdot d\left(x,y\right)+D\cdot\mathbf{1}\left[x\neq y\right].
\end{align}
One can view this as choosing $\beta$ to have the same units as $d$.
Now, let's check where these two curves cross. For simplicity, substitute $\lambda = u + \left( 1 - \alpha \right)$, i.e. $u = \lambda - \left( 1 - \alpha \right)$, so that the crossing condition writes as
\begin{align}
\lambda =\frac{\left(1-\kappa\right)\cdot\left(\lambda-\left(1-\alpha\right)\right)+\psi}{\lambda-\left(1-\alpha\right)+\psi},
\end{align}
and rearrange to see that
\begin{align}
\lambda^{2}+\left(\psi-\left(1-\alpha\right)-\left(1-\kappa\right)\right)\cdot\lambda-\left(\psi-\left(1-\alpha\right)\cdot\left(1-\kappa\right)\right)=0.
\end{align}
One can compute explicitly that this quadratic polynomial has discriminant
\begin{align}
\Delta&=\left(\psi+\alpha+\kappa\right)^{2}-4\cdot\alpha\cdot\kappa \geqslant 0,
\end{align}
and thus has minimal positive solution given by
\begin{align}
\lambda&=\frac{1}{2}\cdot\left\{ 2-\left(\psi+\alpha+\kappa\right)+\sqrt{\left(\psi+\alpha+\kappa\right)^{2}-4\cdot\alpha\cdot\kappa}\right\} .
\end{align}
Writing $\xi = \psi + \alpha + \kappa > 1$, $\eta = \frac{4 \cdot \alpha \cdot \kappa}{\left( \psi +\alpha + \kappa \right)^{2}} \in \left( 0,1 \right)$, $\upsilon = 1 - \sqrt{1 - \eta} \in \left( 0,1 \right)$, it thus holds that
\begin{align}
\lambda_{*}=1-\frac{1}{2}\cdot\xi\cdot\upsilon.
\end{align}
For reversible chains, this also supplies a bound on the spectral gap as
\begin{align}
\gamma \geqslant \gamma_{*} := \frac{1}{2} \cdot \xi \cdot \upsilon.
\end{align}
Note also that by Bernoulli's inequality, $\upsilon \geqslant \frac{1}{2} \cdot \eta$, and so one can also deduce the cruder inequality
\begin{align}
\gamma_{*} \geqslant \frac{\alpha\cdot\kappa}{\psi+\alpha+\kappa},
\end{align}
which may be simpler to work with in certain scenarios.
## Conclusion
This note demonstrates that under suitable 'close coupling and distant drift' conditions on a Markov chain, one can exhibit strict contractivity for the process in an appropriate hybrid transport-TV distance.
One weakness of the bound provided here is the poor dependence on the 'effective diameter' of the state space relative to the Markov chain, $\psi = D \cdot r^{-1}$. In fact, examining the proof reveals that we could have defined $D$ much more locally. That is, suppose that $\delta_x P$, $\delta_y P$ can be coupled such that
- With probability $\alpha$, the chains meet exactly, and
- If they do not meet, then their expected distance after taking a step is bounded above by $D$.
The rest of the proof does not need to change at all, and we thus immediately have a version of the theorem which applies on unbounded state spaces.
Relative to other proof techniques of this sort, one can sometimes expect decent quantitative estimates. To obtain good values of $\alpha$, one typically needs to take $r$ quite small (roughly an inverse polynomial with the dimension, in many practical examples), and the technical difficulty arises in showing that the transport contraction continues to take place at such relatively small distances. There are clear benefits relative to drift and minorisation approaches, as the transition kernels only need to have substantial overlap when they are close to one another, rather than uniformly over some compact set.
One can further hybridise this approach with other techniques, e.g. applying a concave function to the metric first, involving Lyapunov functions, etc., and these tricks can further improve the quality of the resulting bounds. Such extensions are pursued in the detailed work of [Eberle-Majka](https://arxiv.org/abs/1808.07033).
## Acknowledgements
I first came across this particular result through lecture notes of Aaron Smith, who has kindly given his permission for me to share the example.