###### tags: `chasing constants` `mcmc` `monte carlo` `expository` `theory`
# Chasing Constants: Understanding the Nash Inequality
**Overview**: In this note, I will briefly describe the class of functional inequalities known as Nash inequalities, and deduce their implications for convergence to equilibrium.
## Functional Inequalities for Markov Processes
In earlier posts, I have devoted substantial attention to the task of establishing that a given Markov chain admits a spectral gap. Equivalently, I have sought to show that for all suitable $f$, there holds a 'Poincaré' inequality of the form
\begin{align}
\mathcal{E} \left( f \right) \gtrsim \mathrm{Var} \left( f \right),
\end{align}
where the implied constant is uniform in $f$.
This is an example of what is known as a _functional inequality_, as it provides an ordering between some functionals which act on a suitable class of functions $f$. Beyond Poincaré inequalities themselves, there are a variety of subtle refinements, which often involve introducing another term into the inequality, e.g.
\begin{align}
\mathcal{E} \left( f \right) + \Phi \left( f \right) \gtrsim \mathrm{Var} \left( f \right),
\end{align}
for some other functional $\Phi$ 'of quadratic type' (I won't define this precisely, but it will hopefully become clear from context and examples).
Roughly speaking, the pattern is that
1. If $\Phi$ dominates $\mathrm{Var}$ as a functional, then this inequality
a. is weaker than the Poincaré inequality,
b. implies slower-than-exponential convergence of the sequence $| P^n f |_2^2$, and
c. only for a collection of functions which is a strict subset of $L^2 \left( \pi \right)$.
2. If $\Phi$ is dominated by $\mathrm{Var}$ as a functional, then this inequality
a. is stronger than the Poincaré inequality,
b. implies faster-than-exponential convergence of the sequence $| P^n f |_2^2$ (at least for a certain range of $n$), and
c. provides this guarantee for a collection of functions which is a strict superset of $L^2 \left( \pi \right)$.
The first case often arises by taking
\begin{align}
\Phi \left( f \right) &= \mathrm{Osc} \left( f \right)^2 \\
\text{where} \quad \mathrm{Osc} \left( f \right) &:= \sup f - \inf f,
\end{align}
and the second case can arise by taking
\begin{align}
\Phi \left( f \right) &= | f |_1^2.
\end{align}
These correspond to the so-called 'weak' and 'super-' Poincaré inequalities.
This post will essentially focus on a specific instance of a super-Poincaré inequality, known for historical reasons as Nash inequalities
## Nash Inequalities
As presented in [this paper](http://www.numdam.org/article/JEDP_2010____A2_0.pdf), a Markov process satisfies a Nash inequality with 'dimension' $d$ if its Dirichlet form $\mathcal{E}$ satisfies
\begin{align}
\left|f\right|_{2}^{2+d} & \leqslant \left|f\right|_{1}^{2} \cdot \left\{ C_{1} \cdot \left|f\right|_{2}^{2} + C_{2} \cdot \mathcal{E} \left(f\right) \right\}^{d/2}
\end{align}
for some constants $C_1$, $C_2$, and for all suitable $f$.
In the same paper, they show that this inequality implies the variance decay estimate
\begin{align}
| P_t f |_2 \leqslant \left( \max \left\{ 2\cdot C_1, \frac{2 \cdot d \cdot C_2}{t} \right\} \right)^{d/4} \cdot| f |_1.
\end{align}
The positive aspect of this inequality is that the right-hand side only depends on $f$ through its 1-norm, which may be finite even when its 2-norm is infinite. As such, it implies that the semigroup has a decent regularising effect. However, the behaviour of the estimate as a function of $t$ seems not to be so great, especially when we are used to exponential decay with $t$. With this in mind, one realises that the result is then particularly interesting for $f \in L^1 \setminus L^2$, and for small values of $t$.
Anyways, upon reading this result for the first time, I was a bit confused as to where the exponential decay had gone. In this note, I will show that the Nash inequality actually implies a stronger decay estimate (which is less analytically convenient), from which one may deduce both this small-$t$ result and a favourable large-$t$ result.
## Manipulating the Nash inequality
Going forward, we assume $f$ to be nonnegative; the general case can be handled with standard techniques (I may try to provide details on this later).
Let us first rewrite the Nash inequality as
\begin{align}
\left|f\right|_{2}^{2+d} & \leqslant \left|f\right|_{1}^{2} \cdot \left\{ C_{1} \cdot \left|f\right|_{2}^{2} + C_{2} \cdot \mathcal{E} \left(f\right) \right\}^{d/2} \\
\implies \quad \left( \frac{\left|f\right|_{2}^{2}}{\left|f\right|_{1}^{2}}\right)^{1+2/d} &\leqslant C_{1} \cdot \frac{\left|f\right|_{2}^{2}}{\left|f\right|_{1}^{2}} + C_{2} \cdot \frac{\mathcal{E}\left(f\right)}{\left|f\right|_{1}^{2}}.
\end{align}
Introducing now the quantity
\begin{align}
H\left(t\right) &:= \frac{\left|P_{t}f\right|_{2}^{2}}{\left|P_{t}f\right|_{1}^{2}} \\
&=\frac{\left|P_{t}f\right|_{2}^{2}}{\left|f\right|_{1}^{2}},
\end{align}
one can differentiate to see that
\begin{align}
-\dot{H} \left( t \right) &= 2\cdot\frac{\mathcal{E}\left(P_{t}f\right)}{\left|P_{t}f\right|_{1}^{2}}.
\end{align}
Applying our reformulated Nash inequality, it then holds that
\begin{align}
-\dot{H} \left( t \right) &\geqslant \frac{2}{C_{2}} \cdot \left\{ \left(\frac{\left|P_{t}f\right|_{2}^{2}}{\left|P_{t}f\right|_{1}^{2}}\right)^{1+2/d} -C_{1} \cdot \frac{\left|P_{t}f\right|_{2}^{2}}{\left|P_{t}f\right|_{1}^{2}}\right\} \\
&= \frac{2}{C_{2}} \cdot \left\{ H\left(t\right)^{1+2/d} - C_{1}\cdot H\left(t\right)\right\} \\
\leadsto \quad \frac{\mathrm{d}H}{C_{1}\cdot H-H^{1+2/d}}&\geqslant\frac{2}{C_{2}}\cdot\mathrm{d}t.
\end{align}
Integrating this differential inequality from $0$ to $t$ yields that
\begin{align}
\frac{d}{2 \cdot C_{1}} \cdot \left\{ \log\left(\frac{C_{1}}{H\left(t\right)^{2/d}} - 1\right) - \log\left(\frac{C_{1}}{H\left(0\right)^{2/d}} - 1\right)\right\} &\geqslant\frac{2}{C_{2}} \cdot t,
\end{align}
which can be arranged to
\begin{align}
H\left(t\right) & \leqslant \frac{H\left(0\right)}{\left(C_{1}^{-1} \cdot H \left(0\right)^{2/d} + \exp\left(\frac{4\cdot C_{1}}{d\cdot C_{2}} \cdot t \right) \cdot \left(1 - C_{1}^{-1}\cdot H \left(0\right)^{2/d}\right)\right)^{d/2}},
\end{align}
i.e.
\begin{align}
\frac{\left|P_{t}f\right|_{2}^{2}}{\left|f\right|_{1}^{2}} &\leqslant \frac{H\left(0\right)}{\left(C_{1}^{-1}\cdot H\left(0\right)^{2/d}+\exp\left(\frac{4\cdot C_{1}}{d\cdot C_{2}}\cdot t\right)\cdot\left(1-C_{1}^{-1}\cdot H\left(0\right)^{2/d}\right)\right)^{d/2}} \\
&= \frac{H\left(0\right)}{\left(\left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \cdot H \left(0\right)^{2/d} + \exp \left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right)\right)^{d/2}}
\end{align}
So far, our developments have been essentially lossless, i.e. we have accurate decay bounds for both small and large $t$. We now make some observations which seem to lose information, but will be more useful in practice, due to their convenience.
### Small $t$
Make the basic observation that $\exp \left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right) \geqslant 0$ to see that
\begin{align}
\frac{\left|P_{t}f\right|_{2}^{2}}{\left|f\right|_{1}^{2}} &\leqslant \frac{H\left(0\right)}{\left(\left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \cdot H \left(0\right)^{2/d} + \exp \left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right)\right)^{d/2}} \\
&\leqslant\frac{H\left(0\right)}{\left(\left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \cdot H \left(0\right)^{2/d} \right)^{d/2}} \\
&= \frac{1}{\left(\left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \right)^{d/2}}.
\end{align}
Recall now that $\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right) - 1 \geqslant \frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t$ to further simplify this to
\begin{align}
\frac{\left|P_{t}f\right|_{2}^{2}}{\left|f\right|_{1}^{2}} &\leqslant \frac{1}{\left(\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) \cdot C_{1}^{-1} \right)^{d/2}} \\
&= \left(\frac{d\cdot C_{2}}{4\cdot t}\right)^{d/2},
\end{align}
which has the form of the inequality which the authors claimed.
### Large $t$
Here, we instead note that
\begin{align}
\left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \cdot H \left(0\right)^{2/d} \geqslant 0
\end{align}
so that
\begin{align}
\frac{\left|P_{t}f\right|_{2}^{2}}{\left|f\right|_{1}^{2}} &\leqslant \frac{H\left(0\right)}{\left(\left(\exp\left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t\right) - 1\right) \cdot C_{1}^{-1} \cdot H \left(0\right)^{2/d} + \exp \left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right)\right)^{d/2}} \\
&\leqslant \frac{H\left(0\right)}{\left( \exp \left(\frac{4 \cdot C_{1}}{d \cdot C_{2}} \cdot t \right)\right)^{d/2}} \\
&= \frac{| f |_2^2}{|f|_1^2} \cdot \exp \left( - 2 \cdot \frac{C_1}{C_2} \cdot t \right) \\
\implies \quad \left|P_{t}f\right|_{2}^{2} &\leqslant |f|_2^2 \cdot \exp \left( - 2 \cdot \frac{C_1}{C_2} \cdot t \right)
\end{align}
and so the expected exponential convergence does indeed hold.
## Remarks on Operator Norm Bounds
Note that the first of these decay bounds can be interpreted as a statement about the operator norm of $P_t$ as a mapping from $L^1$ to $L^2$. In the setting where $P_t$ is symmetric, i.e. the Markov process is invariant under time reversal, then one can use a duality argument to also bound the operator norm of $P_t$ as a mapping from $L^2$ to $L^\infty$.
That this operator even be bounded is quite a strong result about the regularising properties of the semigroup, known as 'ultraboundedness' (or sometimes 'ultracontractivity', though note that when this term is used, one does not always have a strict contraction).
Of course, the second of the two decay bounds implies operator norm bounds from $L^2$ to itself, and so duality doesn't provide a new result for free in the same way.
## Conclusion
In this note, I have examined the implications of Nash inequalities for proving decay bounds for Markov semigroups. I have exhibited that the original inequality can be used to prove a decay bound which is 'uniformly of the correct order' (or similar), but not convenient to use. I have then demonstrated that applying simple comparisons allows for this decay bound to be translated into two separate bounds which are more convenient and interpretable, though each only interesting for a restricted class of arguments.
Following the same reference [as before](http://www.numdam.org/article/JEDP_2010____A2_0.pdf), one can play a similar game with so-called 'weighted' Nash inequalities. As far as I can tell, for these inequalities, obtaining favourable large-$t$ behaviour is not really an option, and so obtaining a refined two-phase 'decay' bound might be of only limited interest.