Chasing Constants: Understanding the Nash Inequality

###### tags: `chasing constants` `mcmc` `monte carlo` `expository` `theory` # Chasing Constants: Understanding the Nash Inequality **Overview**: In this note, I will briefly describe the class of functional inequalities known as Nash inequalities, and deduce their implications for convergence to equilibrium. ## Functional Inequalities for Markov Processes In earlier posts, I have devoted substantial attention to the task of establishing that a given Markov chain admits a spectral gap. Equivalently, I have sought to show that for all suitable $f$, there holds a 'Poincaré' inequality of the form \begin{align} \mathcal{E} \left( f \right) \gtrsim \mathrm{Var} \left( f \right), \end{align} where the implied constant is uniform in $f$. This is an example of what is known as a _functional inequality_, as it provides an ordering between some functionals which act on a suitable class of functions $f$. Beyond Poincaré inequalities themselves, there are a variety of subtle refinements, which often involve introducing another term into the inequality, e.g. \begin{align} \mathcal{E} \left( f \right) + \Phi \left( f \right) \gtrsim \mathrm{Var} \left( f \right), \end{align} for some other functional $\Phi$ 'of quadratic type' (I won't define this precisely, but it will hopefully become clear from context and examples). Roughly speaking, the pattern is that 1. If $\Phi$ dominates $\mathrm{Var}$ as a functional, then this inequality a. is weaker than the Poincaré inequality, b. implies slower-than-exponential convergence of the sequence $| P^n f |_2^2$, and c. only for a collection of functions which is a strict subset of $L^2 \left( \pi \right)$. 2. If $\Phi$ is dominated by $\mathrm{Var}$ as a functional, then this inequality a. is stronger than the Poincaré inequality, b. implies faster-than-exponential convergence of the sequence $| P^n f |_2^2$ (at least for a certain range of $n$), and c. provides this guarantee for a collection of functions which is a strict superset of $L^2 \left( \pi \right)$. The first case often arises by taking \begin{align} \Phi \left( f \right) &= \mathrm{Osc} \left( f \right)^2 \\ \text{where} \quad \mathrm{Osc} \left( f \right) &:= \sup f - \inf f, \end{align} and the second case can arise by taking \begin{align} \Phi \left( f \right) &= | f |_1^2. \end{align} These correspond to the so-called 'weak' and 'super-' Poincaré inequalities. This post will essentially focus on a specific instance of a super-Poincaré inequality, known for historical reasons as Nash inequalities ## Nash Inequalities As presented in [this paper](http://www.numdam.org/article/JEDP_2010____A2_0.pdf), a Markov process satisfies a Nash inequality with 'dimension' $d$ if its Dirichlet form $\mathcal{E}$ satisfies \begin{align} \left|f\right|_{2}^{2+d} & \leqslant \left|f\right|_{1}^{2} \cdot \left\{ C_{1} \cdot \left|f\right|_{2}^{2} + C_{2} \cdot \mathcal{E} \left(f\right) \right\}^{d/2} \end{align} for some constants $C_1$, $C_2$, and for all suitable $f$. In the same paper, they show that this inequality implies the variance decay estimate \begin{align} | P_t f |_2 \leqslant \left( \max \left\{ 2\cdot C_1, \frac{2 \cdot d \cdot C_2}{t} \right\} \right)^{d/4} \cdot| f |_1. \end{align} The positive aspect of this inequality is that the right-hand side only depends on $f$ through its 1-norm, which may be finite even when its 2-norm is infinite. As such, it implies that the semigroup has a decent regularising effect. However, the behaviour of the estimate as a function of $t$ seems not to be so great, especially when we are used to exponential decay with $t$. With this in mind, one realises that the result is then particularly interesting for $f \in L^1 \setminus L^2$, and for small values of $t$. Anyways, upon reading this result for the first time, I was a bit confused as to where the exponential decay had gone. In this note, I will show that the Nash inequality actually implies a stronger decay estimate (which is less analytically convenient), from which one may deduce both this small-$t$ result and a favourable large-$t$ result. ## Manipulating the Nash inequality Going forward, we assume $f$ to be nonnegative; the general case can be handled with standard techniques (I may try to provide details on this later). Let us first rewrite the Nash inequality as \begin{align} \left|f\right|_{2}^{2+d} & \leqslant \left|f\right|_{1}^{2} \cdot \left\{ C_{1} \cdot \left|f\right|_{2}^{2} + C_{2} \cdot \mathcal{E} \left(f\right) \right\}^{d/2} \\ \implies \quad \left( \frac{\left|f\right|_{2}^{2}}{\left|f\right|_{1}^{2}}\right)^{1+2/d} &\leqslant C_{1} \cdot \frac{\left|f\right|_{2}^{2}}{\left|f\right|_{1}^{2}} + C_{2} \cdot \frac{\mathcal{E}\left(f\right)}{\left|f\right|_{1}^{2}}. \end{align} Introducing now the quantity \begin{align} H\left(t\right) &:= \frac{\left|P_{t}f\right|_{2}^{2}}{\left|P_{t}f\right|_{1}^{2}} \\ &=\frac{\left|P_{t}f\right|_{2}^{2}}{\left|f\right|_{1}^{2}}, \end{align} one can differentiate to see that \begin{align} -\dot{H} \left( t \right) &= 2\cdot\frac{\mathcal{E}\left(P_{t}f\right)}{\left|P_{t}f\right|_{1}^{2}}. \end{align} Applying our reformulated Nash inequality, it then holds that \begin{align} -\dot{H} \left( t \right) &\geqslant \frac{2}{C_{2}} \cdot \left\{ \left(\frac{\left|P_{t}f\right|_{2}^{2}}{\left|P_{t}f\right|_{1}^{2}}\right)^{1+2/d} -C_{1} \cdot \frac{\left|P_{t}f\right|_{2}^{2}}{\left|P_{t}f\right|_{1}^{2}}\right\} \\ &= \frac{2}{C_{2}} \cdot \left\{ H\left(t\right)^{1+2/d} - C_{1}\cdot H\left(t\right)\right\} \\ \leadsto \quad \frac{\mathrm{d}H}{C_{1}\cdot H-H^{1+2/d}}&\geqslant\frac{2}{C_{2}}\cdot\mathrm{d}t. \end{align} Integrating this differential inequality from $0$ to $t$ yields that \begin{align} \frac{d}{2 \cdot C_{1}} \cdot \left\{ \log\left(\frac{C_{1}}{H\left(t\right)^{2/d}} - 1\right) - \log\left(\frac{C_{1}}{H\left(0\right)^{2/d}} - 1\right)\right\} &\geqslant\frac{2}{C_{2}} \cdot t, \end{align} which can be arranged to \begin{align} H\left(t\right) & \leqslant \frac{H\left(0\right)}{\left(C_{1}^{-1} \cdot H \left(0\right)^{2/d} + \exp\left(\frac{4\cdot C_{1}}{d\cdot C_{2}} \cdot t \right) \cdot \left(1 - C_{1}^{-1}\cdot H \left(0\right)^{2/d}\right)\right)^{d/2}}, \end{align} i.e. \begin{align} \frac{\left|P_{t}f\right|_{2}^{2}}{\left|f\right|_{1}^{2}} &\leqslant \frac{H\left(0\right)}{\left(C_{1}^{-1}\cdot H\left(0\right)^{2/d}+\exp\left(\frac{4\cdot C_{1}}{d\cdot C_{2}}\cdot t\right)\cdot\left(1-C_{1}^{-1}\cdot H\left(0\right)^{2/d}\right)\right)^{d/2}} \\ &= \frac{H\left(0\right)}{\left(\left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \cdot H \left(0\right)^{2/d} + \exp \left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right)\right)^{d/2}} \end{align} So far, our developments have been essentially lossless, i.e. we have accurate decay bounds for both small and large $t$. We now make some observations which seem to lose information, but will be more useful in practice, due to their convenience. ### Small $t$ Make the basic observation that $\exp \left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right) \geqslant 0$ to see that \begin{align} \frac{\left|P_{t}f\right|_{2}^{2}}{\left|f\right|_{1}^{2}} &\leqslant \frac{H\left(0\right)}{\left(\left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \cdot H \left(0\right)^{2/d} + \exp \left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right)\right)^{d/2}} \\ &\leqslant\frac{H\left(0\right)}{\left(\left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \cdot H \left(0\right)^{2/d} \right)^{d/2}} \\ &= \frac{1}{\left(\left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \right)^{d/2}}. \end{align} Recall now that $\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right) - 1 \geqslant \frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t$ to further simplify this to \begin{align} \frac{\left|P_{t}f\right|_{2}^{2}}{\left|f\right|_{1}^{2}} &\leqslant \frac{1}{\left(\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) \cdot C_{1}^{-1} \right)^{d/2}} \\ &= \left(\frac{d\cdot C_{2}}{4\cdot t}\right)^{d/2}, \end{align} which has the form of the inequality which the authors claimed. ### Large $t$ Here, we instead note that \begin{align} \left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \cdot H \left(0\right)^{2/d} \geqslant 0 \end{align} so that \begin{align} \frac{\left|P_{t}f\right|_{2}^{2}}{\left|f\right|_{1}^{2}} &\leqslant \frac{H\left(0\right)}{\left(\left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \cdot H \left(0\right)^{2/d} + \exp \left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right)\right)^{d/2}} \\ &\leqslant \frac{H\left(0\right)}{\left( \exp \left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right)\right)^{d/2}} \\ &= \frac{| f |_2^2}{|f|_1^2} \cdot \exp \left( - 2 \cdot \frac{C_1}{C_2} \cdot t \right) \\ \implies \quad \left|P_{t}f\right|_{2}^{2} &\leqslant |f|_2^2 \cdot \exp \left( - 2 \cdot \frac{C_1}{C_2} \cdot t \right) \end{align} and so the expected exponential convergence does indeed hold. ## Remarks on Operator Norm Bounds Note that the first of these decay bounds can be interpreted as a statement about the operator norm of $P_t$ as a mapping from $L^1$ to $L^2$. In the setting where $P_t$ is symmetric, i.e. the Markov process is invariant under time reversal, then one can use a duality argument to also bound the operator norm of $P_t$ as a mapping from $L^2$ to $L^\infty$. That this operator even be bounded is quite a strong result about the regularising properties of the semigroup, known as 'ultraboundedness' (or sometimes 'ultracontractivity', though note that when this term is used, one does not always have a strict contraction). Of course, the second of the two decay bounds implies operator norm bounds from $L^2$ to itself, and so duality doesn't provide a new result for free in the same way. ## Conclusion In this note, I have examined the implications of Nash inequalities for proving decay bounds for Markov semigroups. I have exhibited that the original inequality can be used to prove a decay bound which is 'uniformly of the correct order' (or similar), but not convenient to use. I have then demonstrated that applying simple comparisons allows for this decay bound to be translated into two separate bounds which are more convenient and interpretable, though each only interesting for a restricted class of arguments. Following the same reference [as before](http://www.numdam.org/article/JEDP_2010____A2_0.pdf), one can play a similar game with so-called 'weighted' Nash inequalities. As far as I can tell, for these inequalities, obtaining favourable large-$t$ behaviour is not really an option, and so obtaining a refined two-phase 'decay' bound might be of only limited interest.