Hoeffding's Inequality by Convex Ordering

###### tags: `one-offs` `concentration` `martingales` `convexity` # Hoeffding's Inequality by Convex Ordering **Overview**: In this note, I prove Hoeffding's inequality from the perspectives of martingales and convex ordering. ## The Basic Construction Let $-\infty<a<x<b<\infty$, and define a random variable $M$ with law $M\left(x;a,b\right)$ by \begin{align} M=\begin{cases} a & \text{w.p. }\frac{b-x}{b-a}\\ b & \text{w.p. }\frac{x-a}{b-a}. \end{cases} \end{align} One can then check that M has expectation $x$. ## The Hoeffding Martingale Now, consider a random variable $X$ which takes values in a subset of $\left[ a, b\right]$, and conditional on $X$, sample $Y \sim M \left( X; a, b \right)$. Following our previous observation, we see that $\mathbf{E} \left[Y\mid X\right] = X$, i.e. the ordered pair $\left( X, Y\right)$ is a martingale. Note also that marginally, $Y \overset{d}{=} M \left( \mathbf{E} \left[ X \right]; a, b\right)$. Because $\left(X, Y\right)$ form a martingale, by Jensen's inequality, it follows that they are also convex-ordered, i.e. for any convex $F$, it holds that $\mathbf{E} \left[ F\left(X\right) \right] \leqslant \mathbf{E} \left[ F\left(Y\right) \right]$. (Actually, a result of Strassen establishes that this characterisation is tight, i.e. being convex-ordered also implies the martingale property.) In particular, letting $F \left( x \right) = \exp \left( t \cdot x \right)$ for $t \in \mathbf{R}$, it holds that the moment-generating function of $X$ is dominated by the moment-generating function of $Y = M \left( \mathbf{E} \left[ X \right]; a, b \right)$. Because $Y$ is a two-point distribution, its MGF is thus relatively easy to control. Indeed, the MGF of $Y$ is essentially that of a standard biased coin flip (up to shifting and scaling). Using basic arguments from the theory of exponential families, one sees that \begin{align} \left(\log M_{Y}\right)^{''}\left(t\right)\leqslant\frac{\left(b-a\right)^{2}}{4} \end{align} uniformly in $t$, from which one can bound \begin{align} M_{Y}\left(t\right) \leqslant \exp \left(t \cdot \mathbf{E} \left[X\right] + \frac{t^{2}}{8} \cdot \left(b-a\right)^{2} \right). \end{align} One can then see that \begin{align} M_{X} \left(t\right) \leqslant M_{Y} \left(t\right) \leqslant \exp \left(t \cdot \mathbf{E} \left[X\right] + \frac{t^{2}}{8} \cdot \left(b-a\right)^{2}\right), \end{align} and this is essentially the content of Hoeffding's lemma, from which one often deduces Hoeffding's (concentration) inequality. For me, this is nice, because it highlights that the inequality is essentially a comparison which says that "bounded RVs fluctuate less than a (n appropriate) 2-point random RV". The specifics of the MGF and the tail inequality are the practically relevant consequences of this. A potentially interesting variant of this idea would be to consider a construction where $Y$ takes values on e.g. the set $\left[ a, s \right] \cup \left[ t, b \right]$ for some $a<s<t<b$. A similar ordering structure ought to be present, and perhaps there is some nice interpolation result along these lines. For what it's worth, there is also a construction in [this paper](https://arxiv.org/abs/1909.12067) which is not dissimilar to this idea (but is certainly more ambitious in its scope than Hoeffding's lemma!).