###### tags: `chasing constants` `mcmc` `monte carlo` `expository` `theory`
# Chasing Constants: Eberle's Reflection Coupling Theorem
**Overview**: In this note, I will follow Eberle's proof of contractivity results for diffusions (which may be found [here](https://wt.iam.uni-bonn.de/fileadmin/WT/Inhalt/people/Andreas_Eberle/Preprints/ReflectionCouplingsAndContractionRatesForDiffusions.pdf)), leaving various constants free in order to optimise the final results slightly. As in previous instalments of this series, I emphasise that the proof structure is entirely borrowed.
## Contractivity of Markov Processes
As established in the prelude to this series of posts, establishing that a given Markov process is strictly contractive with respect to any transport metric is a great way to establish further analytical properties of the process, such as the possession of a spectral gap. This note covers another example of a clever technique for proving such a contractivity result for certain classes of diffusion.
I learned these results from papers of Andreas Eberle, but I have since learned that the proof technique originates (at least to some extent) from the work of Mu Fa Chen and Feng Yu Wang, studying PDEs in the 1990s. I will present a restricted version of the main theorem, which actually has a few more tunable knobs than I will allude to. Additionally, I will not really comment on the applications to systems of interacting particles, which is very interesting, but would require quite a bit more care on my part.
## Effortless Contractivity
A good starting point for framing these results is to consider diffusions which can satisfy almost-sure contractivity with a simple synchronous coupling. Consider the diffusion process
\begin{align}
{\rm d}X=b\left(X\right)\,{\rm d}t+{\rm d}W,
\end{align}
and assume that for some $\kappa>0$, one has the contractivity property
\begin{align}
\forall x,\:y,\quad\left\langle b\left(x\right)-b\left(y\right),x-y\right\rangle \leqslant-\kappa\cdot\frac{\left\Vert x-y\right\Vert ^{2}}{2}.
\end{align}
It is then relatively straightforward to show that if one couples two copies of the diffusion to be driven by the same Brownian motion, then the paths of the diffusions contract towards one another at an exponential rate proportional to $\kappa$, i.e. if
\begin{align}
{\rm d}X &= b\left(X\right)\,{\rm d}t+{\rm d}W \\
{\rm d}Y &= b\left(Y\right)\,{\rm d}t+{\rm d}W
\end{align}
then
\begin{align}
\left\Vert X_{t}-Y_{t}\right\Vert ^{2}\leqslant\left\Vert X_{0}-Y_{0}\right\Vert ^{2}\cdot\exp\left(-\kappa\cdot t\right)
\end{align}
almost surely, and hence one can deduce many strong results about the diffusion process and its invariant measure.
Perhaps upsettingly, in the absence of this contractivity assumption on $b$, life is not so easy any more; certainly for the almost-sure contraction result, this assumption appears to be necessary. Nevertheless, we might hope that if the diffusion is contractive like this on 'most' of the state space, and 'not-too-repulsive' on the rest of the space, then a qualitatively similar situation should arise.
The results of this work essentially serve to rigorously confirm this intuition. The key ingredients are that
1. One should consider other ways of coupling the diffusion paths, beyond the synchronous coupling,
2. One should consider that the contractivity might take place in a different metric than the standard Euclidean metric, and
3. One should consider that the diffusions might contract only on average, rather than almost surely.
Combining these insights, we will be able to show that a much broader class of diffusions are indeed contractive in a very useful sense.
## Reflection Coupling
Speaking to the first point, how else might we couple our diffusions? As a first example, let us consider a one-dimensional example, in which $b \equiv 0$ (imposing e.g. periodic boundary conditions to ensure ergodicity). In this case, the synchronous coupling fails disastrously; our processes remain at a fixed distance from one another for all times. So, casting this coupling to the side, we might consider the opposite approach: if our two processes are yet to meet, then we feed them the same driving Brownian motion, but oriented in the opposite direction! This way, at least 'half of the time', the processes will be pushing closer to one another, and there is some hope of the situation improving. Indeed, in this scenario, the difference between the two processes is again a Brownian motion, and we are simply waiting for it to hit $0$, which we know will happen eventually.
The generalisation of this idea beyond one dimension is the 'reflection coupling', and is defined as follows:
\begin{align}
{\rm d}X &= b\left(X\right)\,{\rm d}t+{\rm d}W \\
{\rm d}Y &= b\left(Y\right)\,{\rm d}t+R_{X,Y}{\rm d}W \\
\text{where }\quad R_{X,Y} &= \left({\rm Id}-2\cdot e_{X,Y}\cdot e_{X,Y}^{\top}\right) \\
e_{X,Y} &= {\bf I}\left[X\neq Y\right]\cdot\frac{X-Y}{\left\Vert X-Y\right\Vert }.
\end{align}
The two diffusions are then driven by essentially the same Brownian motion, except in the subspace spanned by the displacement between the two processes, i.e. along the vector $X-Y$. In this direction, we flip the directions of the Brownian motion, hoping to 'get lucky' and improve things half of the time, at least in this particular direction.
I have so far been quite imprecise about demonstrating that the reflection coupling is capable of achieving more than the synchronous coupling in terms of contractivity. Indeed, for this observation to be made rigorous, one needs another ingredient: that of modifying the metric.
## Modifying the Metric
When one speaks of working with a modified metric, one often means a change of basis or similar, e.g. preconditioning. Here, we will not do this. Instead, we will change the metric in a slightly strange way. One way of phrasing the construction is that upon changing the metric, all of the balls in the space will look the same, but they may have a different idea of what their radius is.
To this end, we will construct a new metric $d$ by passing the base Euclidean metric through a nonlinear mapping $f$, defining $d_{f}\left(x,y\right)=f\left(\left\Vert x-y\right\Vert \right)$ for some appropriate $f$. A moment's thought dictates that if this is to be a metric, then $f$ ought to take the value $0$ at $0$, be strictly positive for strictly positive inputs, and be increasing (though not necessarily strictly). We will also impose that $f$ is concave, which is sufficient (but perhaps not necessary) to confirm that $d$ is indeed a metric; this is worth verifying on one's own.
With this in mind, our goal will be to show that the mapping $t \mapsto {\bf E} \left[ d_{f} \left(X_{t},Y_{t} \right) \right]$ is bounded above by $d\left(X_{0},Y_{0}\right)\cdot\exp\left(-c\cdot t\right)$ for some $c>0$. If we are particularly interested in characterising the rate of decay for the original quantity ${\bf E}\left[\left\Vert X_{t}-Y_{t}\right\Vert \right]$, then it will be useful if $f$ is also approximately linear, i.e. if $f\left(r\right)\in\left[c_{-}\cdot r,c_{+}\cdot r\right]$ for some $c_{+}>c_{-}>0$, i.e. that our new metric is a bi-Lipschitz warping of the original metric. One then has that
\begin{align}
{\bf E}\left[\left\Vert X_{t}-Y_{t}\right\Vert \right] &\leqslant\frac{1}{c_{-}}\cdot{\bf E}\left[d_{f}\left(X_{t},Y_{t}\right)\right] \\
&\leqslant\frac{1}{c_{-}}\cdot d_{f}\left(X_{0},Y_{0}\right)\cdot\exp\left(-c\cdot t\right) \\
&\leqslant\frac{c_{+}}{c_{-}}\cdot\left\Vert X_{0}-Y_{0}\right\Vert \cdot\exp\left(-c\cdot t\right).
\end{align}
In particular, for $t > c^{-1} \cdot \log \left( \frac{c_{+}}{c_{-}} \right)$, one sees that ${\bf E}\left[\left\Vert X_{t}-Y_{t}\right\Vert \right] \leqslant \rho \cdot \left\Vert X_{0}-Y_{0}\right\Vert$ for some $\rho \in (0, 1)$, which may be easier to use in some cases.
For certain applications, it may be useful to waive the desideratum that $f$ be sandwiched between two affine functions, but for the main results here, it is not too demanding an ansatz.
## Supermartingales
The final ingredient is to relax from almost-sure contractivity to contractivity on average. Thus, instead of arguing that $d_{f}\left(X_{t},Y_{t}\right)$ decays deterministically along the trajectories of the coupled diffusions, we will instead show that the quantity $t\mapsto\exp\left(c\cdot t\right)\cdot d_{f}\left(X_{t},Y_{t}\right)$ is a supermartingale, i.e. decreases on average. We will approach this via Ito calculus, as follows:
Firstly, one can use Ito's formula to calculate that
\begin{align}
{\rm d}\left\Vert X_{t}-Y_{t}\right\Vert ^{2} &=4\cdot\left\Vert X_{t}-Y_{t}\right\Vert \cdot{\rm d}W_{t}+\left(2\cdot\left\langle b\left(X_{t}\right)-b\left(Y_{t}\right),X_{t}-Y_{t}\right\rangle +4\right)\cdot{\rm d}t \\
{\rm d}\left\Vert X_{t}-Y_{t}\right\Vert &=2\cdot{\rm d}W_{t}+\left\langle b\left(X_{t}\right)-b\left(Y_{t}\right),\frac{X_{t}-Y_{t}}{\left\Vert X_{t}-Y_{t}\right\Vert }\right\rangle \cdot{\rm d}t.
\end{align}
Writing $R_{t}=\left\Vert X_{t}-Y_{t}\right\Vert$ , we now consider the evolution of $f\left(R_{t}\right)$, again applying Ito's formula to obtain that
\begin{align}
{\rm d}f\left(R_{t}\right)=2\cdot f'\left(R_{t}\right)\cdot{\rm d}W+\left\{ -\frac{1}{2}\cdot f'\left(R_{t}\right)\cdot\left\langle b\left(X_{t}\right)-b\left(Y_{t}\right),\frac{X_{t}-Y_{t}}{\left\Vert X_{t}-Y_{t}\right\Vert }\right\rangle +2\cdot f''\left(R_{t}\right)\right\} {\rm d}t.
\end{align}
Recalling the concavity of $f$, we note that it induces some extra downward drift in the evolution of $f\left(R_{t}\right)$. We will thus try to strategically introduce extra concavity in $f$ to account for values of $R$ which $\left\langle b\left(X_{t}\right)-b\left(Y_{t}\right),\frac{X_{t}-Y_{t}}{\left\Vert X_{t}-Y_{t}\right\Vert }\right\rangle >0$, i.e. to compensate for the parts of the space in which contractivity does not come so easily.
## Outlining some Assumptions
To proceed further from here, we must make some quantitative assumptions on b which characterise how contractive or anti-contractive the dynamics are at different scales. To this end, we assume the 'weak contractivity' condition
\begin{align}
\left\Vert x-y\right\Vert =r\quad\implies\quad\left\langle b\left(x\right)-b\left(y\right),x-y\right\rangle \leqslant-\kappa\left(r\right)\cdot\frac{\left\Vert x-y\right\Vert ^{2}}{2},
\end{align}
where $\kappa:\left[0,\infty\right)\to\left(-\infty,\infty\right)$ is continuous and satisfies $\lim\inf_{r\to\infty}\kappa\left(r\right)>0$ and $\left|\int_{0}^{1}r\cdot\min\left\{ 0,\kappa\left(r\right)\right\} \,{\rm d}r\right|<\infty$. I use the modifier 'weak' here to highlight that $\kappa$ need not be positive, and hence this need not imply contractivity in the standard sense. Note that this is not quite a 'local' condition, in that the definition of $\kappa$ necessarily involves considering the value of $b$ at inputs which are at a fixed, non-vanishing distance from one another.
The typical application to have in mind is the overdamped Langevin diffusion, as applied to exploring a potential which is 'strongly convex at infinity', while allowing local non-convexity. Roughly speaking, the standing assumptions on $\kappa$ serve to enforce that i) two points which are sufficiently far away will, on average, move closer together at an exponential rate, and ii) two points which are close to one another will not, on average, diverge from one another too aggressively.
For our contractivity result, we will seek to show that ${\bf E}\left[f\left(R_{t}\right)\right]\leqslant f\left(R_{0}\right)\cdot\exp\left(-c\cdot t\right)$ for some $c>0$. We will establish this by stipulating that $t\mapsto\exp\left(c\cdot t\right)\cdot f\left(R_{t}\right)$ be a supermartingale. Recalling that
\begin{align}
{\rm d}f\left(R_{t}\right)=2\cdot f'\left(R_{t}\right)\cdot{\rm d}W+\left\{ -\frac{1}{2}\cdot f'\left(R_{t}\right)\cdot\left\langle b\left(X_{t}\right)-b\left(Y_{t}\right),\frac{X_{t}-Y_{t}}{\left\Vert X_{t}-Y_{t}\right\Vert }\right\rangle +2\cdot f''\left(R_{t}\right)\right\} {\rm d}t,
\end{align}
and that
\begin{align}
\left\langle b\left(X_{t}\right)-b\left(Y_{t}\right),\frac{X_{t}-Y_{t}}{\left\Vert X_{t}-Y_{t}\right\Vert }\right\rangle &\leqslant-\kappa\left(\left\Vert X_{t}-Y_{t}\right\Vert \right)\cdot\frac{\left\Vert X_{t}-Y_{t}\right\Vert }{2} \\
&=-\frac{1}{2}\cdot R_{t}\cdot\kappa\left(R_{t}\right),
\end{align}
one can compute that the supermartingale condition is implied by the differential inequality
\begin{align}
-\frac{1}{2}\cdot R\cdot\kappa\left(R\right)\cdot f'\left(R\right)+2\cdot f''\left(R\right)+c\cdot f\left(R\right)\leqslant0.
\end{align}
One way of understanding this condition is to see that it ensures that the drift of $t\mapsto\exp\left(c\cdot t\right)\cdot f\left(R_{t}\right)$ is negative.
Writing $K\left(r\right):=\frac{1}{4}\cdot r\cdot\kappa\left(r\right)$, $\lambda=\frac{1}{2}\cdot c$, it is then sufficient to demand that for all $r>0$, it holds that $f''\left(r\right)-K\left(r\right)\cdot f'\left(r\right)+\lambda\cdot f\left(r\right)\leqslant0$.The remainder of our efforts will be devoted to constructing an appropriate $f$ such that this inequality holds for the largest $\lambda$ possible.
## Warm-Up: Non-Expansivity
As a warm-up, let's first consider the case of $\lambda=0$, i.e. non-expansivity of the dynamics. It is then sufficient that
\begin{align}
f''\left(r\right)-K\left(r\right)\cdot f'\left(r\right) &\leqslant0 \\
\leadsto\quad f''\left(r\right) &\leqslant K\left(r\right)\cdot f'\left(r\right).
\end{align}
Recalling that $f$ must be concave, one can strengthen the constraint to
\begin{align}
f''\left(r\right)\leqslant\min\left\{ 0,K\left(r\right)\right\} \cdot f'\left(r\right),
\end{align}
whose equality case has solution $f=\Phi$, with
\begin{align}
\Phi\left(r\right) &:= \int_{0}^{r}\varphi\left(s\right)\:{\rm d}s \\
\varphi\left(r\right)&:= \exp\left(\int_{0}^{r}\min\left\{ 0,K\left(s\right)\right\} \,{\rm d}s\right).
\end{align}
For future developments, it will be useful to remember that $\varphi'\leqslant\varphi\cdot K$. Note also that since $\kappa$ is eventually positive, so too is $K$, and hence $\varphi$ is eventually constant, and $\Phi$ is eventually affine.
Anyways, as an early result, we have shown that at the very least, one can always construct a metric under which the dynamics are non-expansive, i.e. there exists a coupling such that $\Phi\left(R_{t}\right)$ is a supermartingale. This is arguably already a bit surprising. We have shown that under relatively weak conditions, a large class of diffusions are non-expansive on average with respect to an appropriate metric.
## Main Results: Strict Contractivity
For strict contractivity, we will require some extra details and assumptions to proceed. To this end, let $H>0$, and let $\left(R_{0},R_{1}\right)$ be minimal such that
• For $r\geqslant R_{0}$, it holds that $K\left(r\right)\geqslant0$.
• For $r\geqslant R_{1}=R_{1}\left(H\right)$, it holds that $K\left(r\right)\geqslant2\cdot H\cdot R_{1}^{-1}\cdot\left(R_{1}-R_{0}\right)^{-1}\cdot r$.
These correspond to the region where the dynamics are non-expansive in the un-modified metric, and a region in which the dynamics are 'sufficiently strictly contractive' in a quantitative sense, again in the un-modified metric. Additionally, define
\begin{align}
I\left(H\right)=\int_{0}^{R_{1}\left(H\right)}\frac{\Phi\left(r\right)}{\varphi\left(r\right)}\,{\rm d}r.
\end{align}
Now, let $\lambda \in \left( 0, I \left( H \right)^{-1}\right)$. We will show the existence of an $f=f_H$ and $\hat {\lambda} \in\left(0,\lambda\right)$ such that
\begin{align}
t\mapsto\exp\left(2 \cdot \hat {\lambda} \cdot t\right){\bf E}\left[f_H \left(R_{t}\right)\right]
\end{align}
is a supermartingale. For this, we need that
\begin{align}
f''\left(r\right)-K\left(r\right)\cdot f'\left(r\right) + \hat{\lambda}\cdot f\left(r\right)\leqslant0.
\end{align}
Our strategy will essentially be to enforce this differential inequality with $\hat{\lambda} = \lambda$ for $r\leqslant R_{1}$, and then check that the inequality does not degrade too badly for $r>R_{1}$.
### Part 1: Small $r$
As an ansatz, we will try imposing that
\begin{align}
f'\left(r\right)=\varphi\left(r\right)\cdot\psi\left(r\right),
\end{align}
with $\psi\left(0\right)=1$, $\psi$ decreasing to $\psi\left(R_{1}\right)>0$, and then constant on $\left[R_{1},\infty\right)$. This ensures that $f$ is still increasing and concave, and comparable to the original $\Phi$ function, in the sense that
\begin{align}
\psi\left(R_{1}\right)\cdot\Phi\left(r\right)\leqslant f\left(r\right)\leqslant\Phi\left(r\right).
\end{align}
This will be useful going forward, as it will allow us to compare our putative new metric to the fixed function $\Phi$. At an intuitive level, I think of $\psi$ as a filter or attenuating mechanism, which stretches out the metric on the space even further, according to the structure of the functions $\kappa$ and $K$.
Under this ansatz, one can then compute that
\begin{align}
f''\left(r\right) &= \varphi' \left( r \right) \cdot \psi \left( r \right) + \varphi \left( r \right) \cdot \psi' \left( r \right) \\
&\leqslant \varphi\left(r\right)\cdot\left\{ \psi'\left(r\right)+K\left(r\right)\cdot\psi\left(r\right)\right\}
\end{align}
and thus observe that it is sufficient for $\psi$ to satisfy
\begin{align}
\varphi\left(r\right)\cdot\left\{ \psi'\left(r\right)+K\left(r\right)\cdot\psi\left(r\right)\right\} -K\left(r\right)\cdot\varphi\left(r\right)\cdot\psi\left(r\right)+\lambda\cdot f\left(r\right) &\leqslant 0 \\
\leadsto\quad\varphi\left(r\right)\cdot\psi'\left(r\right)+\lambda\cdot f\left(r\right) &\leqslant 0.
\end{align}
By construction, we know that $f\left(r\right)\leqslant\Phi\left(r\right)$, and so it is clear that if $\varphi\left(r\right) \cdot \psi'\left(r\right) + \lambda \cdot \Phi\left(r\right) = 0$, then the desired inequality will hold. As such, define
\begin{align}
\psi_{\lambda}^{'}\left(r\right):=-\lambda\cdot\frac{\Phi\left(r\right)}{\varphi\left(r\right)}\quad\text{for }0\leqslant r\leqslant R_{1}\left(H\right).
\end{align}
Note that it then follows that $\psi\left(R_{1}\right)=1-\lambda\cdot I\left(H\right)>0$, and so $\psi$ is indeed strictly positive.
### Part 2: Large $r$
Note that it is not clear that one can continue to define $\psi$ in this way for $r>R_{1}$, as the value of $\psi$ could eventually become negative, thus compromising the whole endeavour. With this threat looming, for $r>R_{1}$ we will instead make the simple choice of $\psi_{\lambda}^{'} \equiv 0$, which will then surely give rise to an f which is increasing and concave, as desired.
The difference will then be in how we establish that a contractivity estimate still holds in this regime. While one intuits that this should be the 'easy' part of the proof (since we have assumed that we already have good contractivity in the unmodified metric in this regime), the technicalities are different, and so this part of the proof is unfortunately not that much shorter in the end.
Let $r > R_{1}$, and note that on this interval, both $\varphi$ and $\psi_{\lambda}$ are constant, and hence $f_{\lambda}$ is affine. One then computes that
\begin{align}
f''\left(r\right)-K\left(r\right)\cdot f'\left(r\right) &=-K\left(r\right)\cdot\varphi\left(R_{0}\right)\cdot\psi\left(R_{1}\right) \\
&\leqslant-2\cdot H\cdot R_{1}^{-1}\cdot\left(R_{1}-R_{0}\right)^{-1}\cdot r\cdot\varphi\left(R_{0}\right)\cdot\psi\left(R_{1}\right),
\end{align}
applying the definition of $R_{1}$. By concavity of $\Phi$, one can write that
\begin{align}
r\geqslant\Phi\left(r\right)\cdot\frac{R_{1}}{\Phi\left(R_{1}\right)},
\end{align}
and thus deduce that
\begin{align}
f''\left(r\right)-K\left(r\right)\cdot f'\left(r\right)\leqslant-2\cdot H\cdot\left(R_{1}-R_{0}\right)^{-1}\cdot\frac{\varphi\left(R_{0}\right)}{\Phi\left(R_{1}\right)}\cdot\psi\left(R_{1}\right)\cdot\Phi\left(r\right).
\end{align}
Using the fact that $\Phi$ is affine on $r \geqslant R_{0}$, some explicit calculations reveal that
\begin{align}
\frac{1}{R_{1}-R_{0}}\int_{R_{0}}^{R_{1}}\frac{\Phi\left(r\right)}{\varphi\left(r\right)}\,{\rm d}r &\geqslant\frac{1}{2}\cdot\frac{\Phi\left(R_{1}\right)}{\varphi\left(R_{0}\right)}\\
\implies\quad\frac{\varphi\left(R_{0}\right)}{\Phi\left(R_{1}\right)} &\geqslant\frac{1}{2}\cdot\left(\frac{1}{R_{1}-R_{0}}\int_{R_{0}}^{R_{1}}\frac{\Phi\left(r\right)}{\varphi\left(r\right)}\,{\rm d}r\right)^{-1}
\end{align}
and hence that
\begin{align}
f''\left(r\right)-K\left(r\right)\cdot f'\left(r\right) &\leqslant-H\cdot\left(\int_{R_{0}}^{R_{1}}\frac{\Phi\left(r\right)}{\varphi\left(r\right)}\,{\rm d}r\right)^{-1}\cdot\psi\left(R_{1}\right)\cdot\Phi\left(r\right) \\
&\leqslant-H\cdot I\left(H\right)^{-1}\cdot\psi\left(R_{1}\right)\cdot\Phi\left(r\right) \\
&\leqslant-H\cdot I\left(H\right)^{-1}\cdot\psi\left(R_{1}\right)\cdot f\left(r\right) \\
&=-H\cdot I\left(H\right)^{-1}\cdot\left(1-\lambda\cdot I\left(H\right)\right)\cdot f\left(r\right) \\
&=-\mu\left(\lambda,H\right)\cdot f\left(r\right),
\end{align}
with $\mu\left(\lambda,H\right):=H\cdot I\left(H\right)^{-1}\cdot\left(1-\lambda\cdot I\left(H\right)\right)>0$.
## Assembling a Final Bound
Combining the bounds from each of the two intervals, one can then deduce a global contraction rate of at least
\begin{align}
c_{*}\left(H;\lambda\right) &= 2\cdot\min\left\{ \lambda,\mu\left(\lambda,H\right)\right\} \\
&= 2\cdot\min\left\{ \lambda,H\cdot I\left(H\right)^{-1}\cdot\left(1-\lambda\cdot I\left(H\right)\right)\right\}
\end{align}
in a valid metric. Setting $\lambda=\frac{H}{1+H}\cdot I\left(H\right)^{-1}$ then leads to an optimal contraction rate of
\begin{align}
c_{*}\left(H\right)=2\cdot\frac{H}{1+H}\cdot I\left(H\right)^{-1}
\end{align}
in a metric which one might denote as $d_{H,*}$. Note that Eberle's work corresponds to taking $H=1$, leading to the simple estimate of
\begin{align}
c_{*}^{{\rm AE}}=I\left(1\right)^{-1}.
\end{align}
Finally, in principle, one can also optimise over H to obtain the estimate
\begin{align}
c_{*}:=2\cdot\sup_{H>0}\left\{ \frac{H}{1+H}\cdot I\left(H\right)^{-1}\right\} .
\end{align}
As a small check, observe that taking $H\sim0^{+}$ gives $c_{*}\left(H\right)\sim H$, since $\lim_{H\to0^{+}}I\left(H\right)^{-1}\in\left(0,\infty\right)$, whereas $H\to\infty$ gives $c_{*}\left(H\right)\lesssim I\left(H\right)^{-1}\lesssim\lim_{H\to0^{+}}I\left(H\right)^{-1}$. This confirms that one will not obtain arbitrarily good contractivity estimates by tuning $H$, as one might expect, and so Eberle's choice to fix $H=1$ is probably not losing too much of substance, at least without making further assumptions.
If this supremum is attained at a finite, non-zero value of $H$, then it corresponds to the contraction rate in a metric which one might denote as $d_{*}$, in some sense the 'quasi-optimal' metric towards establishing contractivity for this class of processes. Even if the supremum is not attained in this way, for reversible diffusions $c_{*}$ still provides a valid bound on the spectral gap of the process.
## Some Bonus Properties of $f$
Because it's easy enough to check explicitly, I note that by construction, we have that
\begin{align}
\Phi'\left(r\right) =\varphi\left(r\right) &\in\left[\varphi\left(R_{0}\right),1\right] \\
\Phi\left(r\right) &\in\left[\varphi\left(R_{0}\right)\cdot r,r\right],
\end{align}
as well as
\begin{align}
f'\left(r\right) = \varphi\left(r\right) \cdot \psi\left(r\right) &\in \left[ \varphi\left(R_{0}\right)\cdot\psi\left(R_{1}\right),1 \right] \\
f\left(r\right) &\in \left[\varphi\left(R_{0}\right)\cdot\psi\left(R_{1}\right)\cdot r,r\right],
\end{align}
i.e. our transformations of the metric are indeed bi-Lipschitz, with explicit regularity constants. This means that one can easily recover contraction bounds in the original, unmodified ambient metric.
## Some Closing Comments
The main outcome of this result is that even for processes which do not possess uniform contractivity properties in the standard metric of their state space, it is often possible to modify the metric (in a fairly gentle way) to a new metric in which uniform contractivity does hold. This is somehow a strong certificate of the process's ability to equilibrate, which allows us to deduce e.g. a spectral gap for the process, at least in the reversible case.
A natural follow-up question is to ask whether such a certificate always exists, i.e. given a chain which converges to equilibrium at a certain rate, does there exist a metric on the space such that the chain is contractive in this metric, at the same rate? To my knowledge, this is not known, and has some deep connections to the notion of Ricci flow from differential geometry.
Roughly speaking, the Ricci flow involves constructing an evolving family of metrics on the space which are increasingly well-adapted to the dynamics of the process, such that the contractivity properties of the dynamics are approximately constant around the space. This evolution can make the 'most contractive' regions under the original metric look slightly worse, but with the benefit of converting 'anti-contractive' regions into being 'slightly contractive'. Since convergence rates are generally dictated by the worst parts of the space, this tends to lead to improved estimates in practical situations.
## Conclusion
In this note, I have tried to reproduce Eberle's results for the contractivity properties of certain diffusion processes, adding in some extra details to hopefully elucidate the proof structure. As mentioned earlier, the original paper includes some key extensions of these techniques, which are crucial for certain applications. The reader is encouraged to revisit the original paper for a taste of these.
In contrast to the previous post concerning the Weak Harris Theorem, one actually expects that the quantitative estimates obtained from this approach to be quite good in general.
A natural critique is that this proof approach seems perhaps crucially linked to the setting of elliptic diffusions. In fact, this is not the case, and in several works, Eberle and co-authors have exhibited that a similar combination of reflection couplings with modification of the metric can lead to practical bounds for implementable algorithms. Again, the reader is encouraged to seek out these works to experience these results for themselves.
## Acknowledgements
The insight of relating the modification of the base metric to the Ricci flow is not my own; I learned this perspective from Aaron Smith, to whom I am very grateful. The Ricci flow idea is indeed mentioned in the works of Yann Ollivier, though I can not see that this precise connection is made in said works.

Overview: In this note, I log some basic observations about diffusion-based generative models.

8/14/2023Overview: In this note, I describe some aspects of hierarchical structure in MCMC algorithms, and how they can be of theoretical and practical relevance.

8/9/2023Overview: In this note, I discuss a recurrent question which can be used to generate research questions about methods of all sorts. I then discuss a specific instance of how this question has proved fruitful in the theory of optimisation algorithms. Methods and Approximations A nice story is that when Brad Efron derived the bootstrap, it was done in service of the question “What is the jackknife an approximation to?”. I can't help but agree that there's something quite exciting about research questions which have this same character of ''What is (this existing thing) an approximation to?''. One bonus tilt on this which I appreciate is that there can be multiple levels of approximation, and hence many answers to the same question. One well-known example is gradient descent, which can be viewed as an approximation to the proximal point method, which can then itself be viewed as an approximation to a gradient flow. There are probably even more stops along the way here. In this case, there is even the perspective that from the perspective of mathematical theory, there may be at least as much to be gained by stopping off at the proximal point interpretation, as there is from the gradient flow perspective. My experience is that generalist applied mathematicians get to grips with the gradient flow quickly, but optimisation theorists can squeeze more out of the PPM formulation. There is thus some hint that using this 'intermediate' approximation can be particularly insightful in its own right. It would be interesting to collect more examples with this character.

5/22/2023Overview: In this note, I prove Hoeffding's inequality from the perspectives of martingales and convex ordering. The Basic Construction Let $-\infty<a<x<b<\infty$, and define a random variable $M$ with law $M\left(x;a,b\right)$ by \begin{align} M=\begin{cases} a & \text{w.p. }\frac{b-x}{b-a}\ b & \text{w.p. }\frac{x-a}{b-a}. \end{cases}

5/22/2023
Published on ** HackMD**

or

By clicking below, you agree to our terms of service.

Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet

Wallet
(
)

Connect another wallet
New to HackMD? Sign up