Chasing Constants: Tightening the Weak Harris Theorem

###### tags: `chasing constants` `mcmc` `monte carlo` `expository` `theory` # Chasing Constants: Tightening the Weak Harris Theorem **Overview**: In this note, I will follow Hairer-Mattingly's simple proof of the Weak Harris theorem (which may be found [here](https://www.hairer.org/papers/harris.pdf)), taking a little care to optimise the constants involved. I emphasise that the proof structure is entirely borrowed. ## The Weak Harris Theorem The Weak Harris Theorem is a result about the convergence to equilibrium of ergodic Markov chains. Roughly speaking, it assumes that there is some set $\mathcal{C}$ in the 'center' of the state space such that 1. For any two points $x, y$ both in $\mathcal{C}$, the transition kernels $\mathrm{P} \left( x, \cdot \right)$ and $\mathrm{P} \left( y, \cdot \right)$ have substantial overlap, and 2. The Markov chain returns to the set $\mathcal{C}$ often, in a certain sense. Combining these two arguments, one can show that given two copies of the Markov chain which start at any two initial points, one can couple the chains so that eventually, with high probability, they meet exactly. Roughly speaking, you wait for both chains to enter the set $\mathcal{C}$, then try to make them meet exactly whilst they're there, and you hope (prove) that this doesn't take too long to succeed. This implies various forms of convergence statements for the Markov chain, which can be useful. One aspect of the Theorem which can be quite appealing is that it does not assume a priori that the chain has a known invariant measure. ## The Contractivity Proof In 2011, Hairer and Mattingly provided a very appealing proof of the Weak Harris Theorem, in many ways simpler than the original proof. The argument is to show that under the assumptions of the theorem, the Markov chain is in fact a contraction in an appropriately-designed transport distance. From my point of view, this is particularly appealing because (as detailed in the previous post), if a reversible Markov chain is a contraction in _any_ transport distance, then it has a spectral gap, which can be estimated via the contractivity. The purpose of this note is then to detail this proof. ## Assumptions Defining the linear operator $P$ by \begin{align} \left( Pf \right)\left( x \right) = \int \mathrm{P} \left( x, \mathrm{d} y \right) f\left( y \right), \end{align} the first assumption is that there exists a function $V:\mathcal{X}\to\left[0,\infty\right)$ and constants $\gamma\in\left(0,1\right)$, $K>0$ such that the condition \begin{align} PV\left(x\right)\leqslant\left(1-\gamma\right)\cdot V\left(x\right)+K \end{align} holds. One typically refers to $V$ as a 'Lyapunov function' and to the above as a 'drift condition'. This condition suggests that when the value of $V$ is large, then under the dynamics of the Markov chain, the value of $V$ will tend to decrease at an exponential rate, on average, at least until the value of $V$ reduces sufficiently. The second assumption is that there exists constants $R > 2 \cdot K \cdot \gamma^{-1}$, $\alpha \in \left( 0, 1 \right)$, and a probability measure $\nu \in {\cal P} \left({\cal X}\right)$ such that for $x \in{ \mathcal C}:=\left\{ x:V\left(x\right)\leqslant R\right\}$ , the condition \begin{align} P \left( x,{\rm d} y \right) \geqslant \alpha \cdot \nu \left({\rm d}y\right) \end{align} holds. This condition is typically known as a 'minorisation condition', and formalises the notion that for any two points in $\mathcal{C}$, their transition kernels have substantial overlap. For later derivations, we will write $R =2 \cdot K \cdot \gamma^{-1} \cdot \left( 1-\psi \right)^{-1}$ with $\psi \in \left( 0, 1 \right)$. We will also later use the derived quantities $\eta = \frac{\alpha \cdot \gamma}{\left( \frac{\alpha + \gamma}{2} \right)^{2}}$, and $\xi = \eta \cdot \psi$, $\delta = 1 - \sqrt{1 - \xi}$, all of which lie in $\left( 0, 1 \right)$. Note that by a straightforward application of Markov's inequality, one can show that any invariant measure $\mu$ of $\mathrm{P}$ for which $\mu \left( V\right)<\infty$ must place mass $\geqslant \frac{1}{2}\left(1+\psi\right)$ on the set $\mathcal{C}$, so $\mathcal{C}$ is in some sense quite a large set. This can be confusing, as the set which supports a minorisation condition is often referred to as a 'small' or even 'petite' set in other circumstances. In this note, I will not particularly discuss how one establishes such assumptions, but simply comment that in many cases, there is a good understanding of how to construct sensible Lyapunov functions and identify sets which admit a minorisation condition, at least if one does not care too much about the constants obtained in doing so. ## Constructing the Metric The trick of the proof is to construct an appropriate metric which is contracted under the action of $P$. An intuition should be as follows: if two points in the space both have large values of $V$, then the action of the chain should bring them closer together, since their values of $V$ are both likely to decrease under the action of the Markov chain. If two points do not have large values of $V$, then we are instead relying on the minorisation condition to bring the points close together. With this in mind, the authors propose a family of metrics indexed by $\beta > 0$, given by \begin{align} d_{\beta} \left( x, y \right) = \left\{ 2 + \beta \cdot \left( V \left( x \right) + V \left( y \right) \right) \right\} \cdot{\bf I} \left[ x \neq y \right]. \end{align} It is worth taking a moment to convince yourself that this is indeed a metric. Returning to our intuition, we see that taking a large $\beta$ involves focusing on the Lyapunov assumption, and will give good contractive estimates in the tails, whereas taking a small $\beta$ corresponds to focusing on the minorisation assumption, which provides more favourable estimates inside $\mathcal{C}$. Eventually, we will have to carefully tune the value of $\beta$ to obtain the best of both worlds. Since $d_\beta$ is a metric, we are within our rights to define Lipschitz functions with respect to this metric, as well as an associated seminorm \begin{align} \left| f \right|_{{\rm Lip},\beta} := \sup \left\{ \frac{| f \left( x \right) - f \left( y \right) |}{d_\beta \left( x, y \right)} : x \neq y \right\}. \end{align} To show that the Markov chain is a contraction in the transport metric associated to $d_\beta$, it suffices to demonstrate an estimate of the form \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| \leqslant \lambda_{*} \left( \beta \right) \cdot \left| f \right|_{{\rm Lip},\beta} \cdot d_{\beta} \left( x, y \right) \end{align} with $\lambda_* \left( \beta \right) < 1$ which holds uniformly over $f$ which are Lipschitz in the $d_\beta$ metric. If this result is not familiar, it is worth taking a moment to consider why it might be the case (hint: consider dual formulations of the transport problem). Given such a $\lambda_* \left( \beta \right)$, if the kernel $\mathrm{P}$ is reversible, then one may estimate its spectral gap as $\gamma_{\mathrm{P}} \geqslant 1 - \lambda_* \left( \beta \right)$. As such, we will seek a value of $\beta$ which maximises the resulting estimate. ## Two Basic Lemmas The following elementary lemmas will be useful when optimising various bounds. I detail them here because I make relatively frequent use of them, and it is helpful to notice when they are being applied. **Lazy Maximum Lemma:** Let $a, b, c, d > 0$, and suppose that there exist $u, v > 0$ such that $a \leqslant u \cdot c$, $b \leqslant v \cdot d$. It then holds that $a + b \leqslant \max \left\{ u, v \right\} \cdot \left( c + d \right)$. **Proof:** Observe that $a \leqslant u \cdot c \leqslant \max \left\{ u, v \right\} \cdot c$, and $b \leqslant v \cdot d \leqslant \max \left \{ u, v \right\} \cdot d$ to see that $a + b \leqslant \max \left\{ u, v \right\} \cdot c + \max \left \{ u, v \right\} \cdot d = \max \left\{ u, v \right\} \cdot \left( c + d \right)$, as claimed. **Two-Lines Lemma:** Let $a,b,c,d>0$. Then \begin{align} \min_{\theta\geqslant0}\left\{ \max\left\{ a+b\cdot\theta,c-d\cdot\theta\right\} \right\} =\frac{a\cdot d+b\cdot\max\left\{ a,c\right\} }{b+d} \end{align} **Proof:** Draw the two lines as a function of $\theta \geqslant 0$. If $a \geqslant c$, then the first line always lies above the second line, and so the minimum will be attained at $\theta = 0$. If $a < c$, then the lines cross for some value of $\theta > 0$, at which point the minimiser is attained; explicit calculations allow one to conclude. ## The Contraction Bounds We bound the difference $\left| Pf \left( x \right) - Pf \left( y \right) \right|$ in three different ways, according to whether the points $x$, $y$ are equal, both in the tails of the distribution, or both in the bulk of the distribution. This essentially corresponds to the sets on which $d_\beta$ is $0$, large, or small and positive, respectively. We crucially use the fact that $f$ is $d_\beta$-Lipschitz, in order to introduce the Lyapunov function $V$ into the calculations, so that we can make use of the drift condition. Additionally, by splitting the proof according to the value of $d_\beta$, and hence of $V \left( x \right) + V\left( y \right)$, we will be able to explicitly trade off certain quantities against one another, which will be invaluable for obtaining optimised bounds. ### Trivial Contraction **Proposition:** For $\beta>0$, $x\in{\mathcal X}$, $y=x$, it holds that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| \leqslant \lambda_{{\rm Trivial},*} \left( \beta \right) \cdot \left| f \right|_{{\rm Lip},\beta} \cdot d_{\beta} \left( x, y \right) \end{align} with \begin{align} \lambda_{{\rm Trivial},*} \left( \beta \right) = 0. \end{align} **Proof:** Under the stated conditions, both sides of the inequality are equal to $0$, from which the claim follows immediately. ### Tail Contraction **Proposition:** For $\beta>0$, $x, y\in{\mathcal X}$ such that $V\left(x\right)+V\left(y\right)\geqslant R$, it holds for all $\theta \geqslant 0$ that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| \leqslant \lambda_{{\rm Tail}} \left( \beta, \theta \right) \cdot \left| f \right|_{{\rm Lip},\beta} \cdot d_{\beta} \left( x, y \right) \end{align} with \begin{align} \lambda_{{\rm Tail}} \left( \beta, \theta \right) = \max\left\{ 1+\beta\cdot K-\beta\cdot R\cdot\theta,\left(1-\gamma\right)+2\cdot\theta\right\}. \end{align} **Proof:** Suppose that $x \neq y$ and $V \left( x \right) + V \left( y \right) \geqslant R$. Rearranging and using the Lipschitz assumption on f allows us to write that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| &\leqslant {\bf E} \left[ \left| f \left( x' \right) - f \left( y' \right) \right| \right] \\ &\leqslant{\bf E} \left[ \left| f \right|_{{\rm Lip},\beta} \cdot d_{\beta} \left( x', y' \right) \right] \\ &\leqslant\left|f\right|_{{\rm Lip},\beta} \cdot {\bf E} \left[ 2 + \beta \cdot \left( V \left( x \right) + V \left( y \right) \right) \right] \\ &= \left| f \right|_{{\rm Lip},\beta} \cdot \left\{ 2 + \beta \cdot \left( PV \left( x \right) + PV \left( y \right) \right) \right\} . \end{align} Apply the drift condition to write that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| &\leqslant \left| f \right|_{{\rm Lip},\beta} \cdot \left\{ 2 + \beta \cdot \left( 1- \gamma \right) \cdot \left( V \left( x \right) + V \left( y \right) \right) + 2 \cdot \beta \cdot K \right\} \\ &= \left| f \right|_{{\rm Lip},\beta} \cdot \left\{ 2 \cdot \left( 1 + \beta \cdot K \right) + \beta \cdot \left( 1- \gamma \right) \cdot \left( V \left( x \right) + V \left( y \right) \right) \right\}. \end{align} For $x$, $y$ as specified, and for any $\theta \geqslant 0$, it holds that \begin{align} 0 \leqslant 2 \cdot \beta \cdot \theta \cdot \left( V \left( x \right) + V \left( y \right) - R \right). \end{align} Adding this to the earlier inequality shows that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| &\leqslant \left| f \right|_{{\rm Lip},\beta} \cdot \left\{ 2 \cdot \left( 1 + \beta \cdot K - \beta \cdot R \cdot \theta \right) + \beta \cdot \left( \left( 1- \gamma \right) + 2 \cdot \theta \right) \cdot \left( V \left( x \right) + V \left( y \right) \right) \right\} \\ &\leqslant \left| f \right|_{{\rm Lip},\beta} \cdot \lambda_{{\rm Tail}} \left( \beta, \theta \right) \cdot d_\beta \left( x, y \right), \end{align} where by the Lazy Maximum Lemma we may take \begin{align} \lambda_{{\rm Tail}} \left( \beta, \theta \right) = \max\left\{ 1+\beta\cdot K-\beta\cdot R\cdot\theta,\left(1-\gamma\right)+2\cdot\theta\right\}. \end{align} as claimed. ### Bulk Contraction **Proposition:** For $\beta>0$, $x, y\in{\mathcal X}$ such that $V\left(x\right)+V\left(y\right)\leqslant R$, it holds for all $\theta \geqslant 0$ that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| \leqslant \lambda_{{\rm Bulk}} \left( \beta, \theta \right) \cdot \left| f \right|_{{\rm Lip},\beta} \cdot d_{\beta} \left( x, y \right) \end{align} with \begin{align} \lambda_{{\rm Bulk}} \left( \beta, \theta \right) = \max\left\{ \left( 1 - \alpha \right) + \beta \cdot K + \beta \cdot R \cdot \theta, \left( 1 -\gamma \right) - 2 \cdot \theta \right\}. \end{align} **Proof:** Suppose that $x \neq y$ and $V \left( x \right) + V \left( y \right) \leqslant R$. It then holds that both $V \left( x \right) \leqslant R$, $V \left( y \right) \leqslant R$, and hence that the minorisation condition applies. For $z \in {\mathcal C}$, apply the minorisation condition to exhibit the decomposition \begin{align} \mathrm{P} \left( z, \cdot \right) = \alpha \cdot \nu \left( \cdot \right) + \left( 1 - \alpha \right) \cdot \tilde{\mathrm{P}} \left( z, \cdot \right), \end{align} where $\tilde{\rm{P}}$ is again a Markov kernel. Then \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| &= \left( 1 - \alpha \right) \cdot \left| \tilde{P}f \left( x \right) - \tilde{P}f \left( y \right) \right| \\ &\leqslant \left( 1 - \alpha \right) \cdot \tilde{\bf{E}} \left[ \left| f \left( \tilde{x}' \right) - f \left( \tilde{y}' \right) \right| \right] \\ &\leqslant \left( 1 - \alpha \right) \cdot \tilde{\bf{E}} \left[ \left| f \right|_{{\rm Lip},\beta} \cdot d_{\beta} \left( \tilde{x}', \tilde{y}' \right) \right] \\ &\leqslant \left( 1 - \alpha \right) \cdot \left| f \right|_{{\rm Lip},\beta} \cdot \tilde{\bf{E}} \left[ 2 + \beta \cdot \left( V \left( \tilde{x}' \right) + V \left( \tilde{y}' \right) \right) \right] \\ &= \left( 1 - \alpha \right) \cdot \left| f \right|_{{\rm Lip},\beta} \cdot\left\{ 2 + \beta \cdot \left( \left( \tilde{P} V \right) \left( x \right) + \left( \tilde{P} V \right) \left( y \right) \right) \right\} \\ &= \left( 1 - \alpha \right) \cdot \left| f \right|_{{\rm Lip},\beta} \cdot \left\{ 2 + \beta \cdot \left( 1 - \alpha \right)^{-1} \cdot \left( \left( P V \right) \left( x \right) + \left( P V \right) \left( y \right) \right) \right\} \\ &= \left| f \right|_{{\rm Lip},\beta} \cdot \left\{ 2 \cdot \left( 1 - \alpha \right) + \beta \cdot \left( \left( P V \right) \left( x \right) + \left( P V \right) \left( y \right) \right) \right\}. \end{align} Applying again the drift condition, we see that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| &\leqslant \left| f \right|_{{\rm Lip},\beta} \cdot \left\{ 2 \cdot \left( 1 - \alpha \right) + \beta \cdot \left( \left( 1 - \gamma \right) \left( V \left( x \right) + V \left( y \right) \right) + 2 \cdot K \right) \right\} \\ &= \left| f \right|_{{\rm Lip},\beta} \cdot \left\{ 2 \cdot \left( \left( 1 - \alpha \right) + \beta \cdot K\right) + \beta \cdot \left( 1 - \gamma \right) \left( V \left( x \right) + V \left( y \right) \right) \right\}. \end{align} As in the tail case, note that for $x$, $y$ as specified, and for $\theta \geqslant 0$, it holds that \begin{align} 0 \leqslant 2 \cdot \beta \cdot \theta \cdot \left( R - \left( V \left( x \right) + V \left( y \right) \right) \right). \end{align} Adding this to the earlier inequality shows that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| &\leqslant \left| f \right|_{{\rm Lip},\beta} \cdot \left\{ 2 \cdot \left( \left( 1 - \alpha \right) + \beta \cdot K + \beta \cdot R \cdot \theta \right) + \beta \cdot \left( \left( 1 - \gamma \right) - 2 \cdot \theta \right) \left( V \left( x \right) + V \left( y \right) \right) \right\} \\ &\leqslant \left| f \right|_{{\rm Lip},\beta} \cdot \lambda_{{\rm Bulk}} \left( \beta, \theta \right) \cdot d_\beta \left( x, y \right), \end{align} where by the Lazy Maximum Lemma we may take \begin{align} \lambda_{{\rm Bulk}} \left( \beta, \theta \right) = \max\left\{ \left( 1 - \alpha \right) + \beta \cdot K + \beta \cdot R \cdot \theta, \left( 1 -\gamma \right) - 2 \cdot \theta \right\}. \end{align} ## Optimising the Bounds Given these three bounds (Trivial, Tail, and Bulk), we consider different values of $\beta$, and examine which of the bounds is largest in each regime. One expects that the best overall bound will come when the two nontrivial terms are balanced. ### Small $\beta$ **Proposition:** Suppose that $\alpha > \gamma$, and let $\beta \in \left[ 0, \left( \alpha - \gamma \right) \cdot K^{-1} \right]$. Then, for all $x, y \in \mathcal{X}$, it holds that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| \leqslant \lambda_{*} \left( \beta \right) \cdot \left| f \right|_{{\rm Lip},\beta} \cdot d_{\beta} \left( x, y \right), \end{align} with \begin{align} \lambda_{*} \left( \beta \right) \leqslant \frac{2 + \beta \cdot \left( \left( 1 - \gamma \right) \cdot R + 2 \cdot K \right)}{2 + \beta \cdot R}. \end{align} **Proof:** For $\beta \in \left[ 0, \left( \alpha - \gamma \right) \cdot K^{-1} \right]$, it holds that $\left( 1 - \alpha \right) + \beta \cdot K \leqslant \left( 1 - \gamma \right) \leqslant 1 + \beta \cdot K$. Applying the Two-Lines Lemma, it thus holds that \begin{align} \lambda_{{\rm Tail},*} \left( \beta \right) := \inf_{\theta \geqslant 0} \lambda_{{\rm Tail}} \left( \beta, \theta \right) &= \frac{2 + \beta \cdot \left( \left( 1 - \gamma \right) \cdot R + 2 \cdot K \right)}{2 + \beta \cdot R} \\ \lambda_{{\rm Bulk},*} \left( \beta \right) := \inf_{\theta \geqslant 0} \lambda_{{\rm Bulk}} \left( \beta, \theta \right) &= \frac{2 \cdot \left( 1 - \alpha \right) + \beta \cdot \left( \left( 1 - \gamma \right) \cdot R + 2 \cdot K \right)}{2 + \beta \cdot R}, \end{align} and of course $\lambda_{{\rm Trivial},*} \left( \beta \right) = 0$. For all $x, y \in{\mathcal X}$, it clearly holds that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| \leqslant \max \left\{ \lambda_{{\rm Trivial,*}} \left( \beta \right),\lambda_{{\rm Tail},*} \left( \beta \right), \lambda_{{\rm Bulk},*} \left( \beta \right) \right\} \cdot \left| f \right|_{{\rm Lip},\beta} \cdot d_{\beta} \left( x, y\right), \end{align} and by noting that the tail bound is always the largest of the three bounds, the result follows. ### Intermediate $\beta$ **Proposition:** Let $\beta \in \left[ \left( \alpha - \gamma \right)_+ \cdot K^{-1}, \alpha \cdot K^{-1} \right]$. Then, for all $x, y \in \mathcal{X}$, it holds that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| \leqslant \lambda_{*} \left( \beta \right) \cdot \left| f \right|_{{\rm Lip},\beta} \cdot d_{\beta} \left( x, y \right), \end{align} with \begin{align} \lambda_{*} \left( \beta \right) \leqslant \max \left\{ \frac{2 + \beta \cdot \left( \left( 1 - \gamma \right) \cdot R + 2 \cdot K \right)}{2 + \beta \cdot R}, \left( 1 - \alpha \right) + \beta \cdot K \right\} . \end{align} **Proof:** For $\beta \in \left[ \left( \alpha - \gamma \right)_{+} \cdot K^{-1}, \alpha \cdot K^{-1} \right]$, it holds that $\left( 1 - \gamma \right) \leqslant \left( 1 - \alpha \right) + \beta \cdot K \leqslant 1 + \beta \cdot K$. Applying the Two-Lines Lemma, it then holds that \begin{align} \lambda_{{\rm Tail},*} \left( \beta \right) := \inf_{\theta \geqslant 0} \lambda_{{\rm Tail}} \left( \beta, \theta \right) &= \frac{2 + \beta \cdot \left( \left( 1 - \gamma \right) \cdot R + 2 \cdot K \right)}{2 + \beta \cdot R} \\ \lambda_{{\rm Bulk},*} \left( \beta \right) := \inf_{\theta \geqslant 0} \lambda_{{\rm Bulk}} \left( \beta, \theta \right) &= \left( 1 - \alpha \right) + \beta \cdot K, \end{align} and again $\lambda_{{\rm Trivial}}\left(\beta,\theta\right)=0$. As before, for all $x,y\in{\cal X}$, it holds that \begin{align} \left| Pf \left( x \right) - Pf \left( y \right) \right| \leqslant \max \left\{ \lambda_{{\rm Trivial,*}} \left( \beta \right),\lambda_{{\rm Tail},*} \left( \beta \right), \lambda_{{\rm Bulk},*} \left( \beta \right) \right\} \cdot \left| f \right|_{{\rm Lip},\beta} \cdot d_{\beta} \left( x, y\right), \end{align} and by dropping the trivial bound, the result follows. ### Large $\beta$ As far as I can tell, using the current techniques with $\beta > \alpha \cdot K^{-1}$ only leads to values of $\lambda_* \left( \beta \right)$ which exceed $1$, and hence are uninteresting from the perspective of convergence to equilibrium. ### Best $\beta$ We now conclude with our optimised estimates: **Theorem:** There exists $\beta_* > 0$ (which can be made explicit) such that \begin{align} 1 - \lambda_* \left( \beta_* \right) \geqslant \delta \cdot \frac{\alpha + \gamma}{2}. \end{align} **Proof:** For $\beta \in \left[ \left( \alpha - \gamma \right)_{+} \cdot K^{-1}, \alpha \cdot K^{-1} \right]$, it holds by the Proposition on 'Intermediate $\beta$' that $\lambda_{*} \left( \beta \right) \leqslant \max \left\{ \frac{2 + \beta \cdot \left( \left( 1 - \gamma \right) \cdot R + 2 \cdot K \right)}{2 + \beta \cdot R}, \left( 1 - \alpha \right) + \beta \cdot K \right\}$. On this interval, the first term decreases, and the second term increases. We will show that they cross at a particular value of $\beta$ in this interval, thus furnishing us with a bound on the rate of convergence. Rearranging the equality $\frac{2 + \beta \cdot \left( \left( 1 - \gamma \right) \cdot R + 2 \cdot K \right)}{2 + \beta \cdot R} = \left( 1 - \alpha \right) + \beta \cdot K$, one obtains the quadratic equation \begin{align} \left( \beta \cdot K \right)^{2} + \left( \beta \cdot K \right) \cdot \left( \gamma - \alpha \right) - \frac{2 \cdot \alpha \cdot K}{R}=0. \end{align} Noting that our optimal bound will satisfy $\lambda_{*} = \left( 1 - \alpha \right) + \beta_{*} \cdot K$, we rewrite this equation in terms of $\lambda := \left( 1 - \alpha \right) + \beta \cdot K$ and rearrange to obtain \begin{align} \left( \lambda - \frac{\left( 1 - \alpha \right) + \left( 1 - \gamma \right)}{2} \right)^{2} = \left( \frac{\alpha - \gamma}{2} \right)^{2} + \frac{2 \cdot \alpha \cdot K}{R}. \end{align} Straightforward algebraic manipulations establish that $\frac{2 \cdot \alpha \cdot K}{R} = \alpha \cdot \gamma - \xi \cdot \left( \frac{\alpha + \gamma}{2} \right)^{2}$, and hence that \begin{align} \left( \lambda - \frac{\left( 1 - \alpha \right) + \left( 1 - \gamma \right)}{2}\right)^{2} &=\left( \frac{\alpha - \gamma}{2}\right)^{2} + \alpha \cdot \gamma - \xi \cdot \left( \frac{\alpha + \gamma}{2}\right)^{2} \\ &=\left( \frac{\alpha + \gamma}{2}\right)^{2} \cdot \left( 1 - \xi \right) \\ &=\left( \frac{\alpha + \gamma}{2}\right)^{2} \cdot \left( 1 - \delta \right)^2 \end{align} which has solution $\lambda_{*} = 1 - \delta \cdot \frac{\alpha + \gamma}{2}$, as claimed. Finally, one can check on a case-by-case basis that $\beta_{*} \in \left[ \left( \alpha - \gamma \right)_{+} \cdot K^{-1}, \alpha \cdot K^{-1} \right]$, and hence conclude. **Corollary:** If $\rm{P}$ is also reversible, then it admits an $L^2$ spectral gap $\gamma_*$ which satisfies $\gamma_* \geqslant \delta \cdot \frac{\alpha + \gamma}{2}$. **Remark:** In the (somewhat rare) case in which $\alpha>\gamma$, one can check that using the 'small $\beta$' bound on $\lambda$ does not lead to any improvement, i.e. it suffices to consider the 'intermediate $\beta$' regime. ## Conclusion In this note, I have tried to reproduce the Hairer-Mattingly proof of the Weak Harris Theorem, moving more slowly to clarify (for myself) some of the intermediate steps. A benefit of this treatment is that I have been able to optimise the resulting estimates on the spectral gap of the chain (in the reversible case). In practice, one does not necessarily expect these bounds to be quantitatively favouraable, as the parameter $\alpha$ can often degenerate quite poorly with the dimension of the problem. Finally, note that a recent [work](https://arxiv.org/abs/2110.09650) of Cañizo-Mischler uses conceptually related techniques to further elucidate the structure of this proof, providing also a neat extension to the setting of subgeometric convergence of Markov chains.