###### tags: `one-offs` `concentration` `martingales` `convexity`
# Hoeffding's Inequality by Convex Ordering
**Overview**: In this note, I prove Hoeffding's inequality from the perspectives of martingales and convex ordering.
## The Basic Construction
Let $-\infty<a<x<b<\infty$, and define a random variable $M$ with law $M\left(x;a,b\right)$ by
\begin{align}
M=\begin{cases}
a & \text{w.p. }\frac{b-x}{b-a}\\
b & \text{w.p. }\frac{x-a}{b-a}.
\end{cases}
\end{align}
One can then check that M has expectation $x$.
## The Hoeffding Martingale
Now, consider a random variable $X$ which takes values in a subset of $\left[ a, b\right]$, and conditional on $X$, sample $Y \sim M \left( X; a, b \right)$. Following our previous observation, we see that $\mathbf{E} \left[Y\mid X\right] = X$, i.e. the ordered pair $\left( X, Y\right)$ is a martingale. Note also that marginally, $Y \overset{d}{=} M \left( \mathbf{E} \left[ X \right]; a, b\right)$.
Because $\left(X, Y\right)$ form a martingale, by Jensen's inequality, it follows that they are also convex-ordered, i.e. for any convex $F$, it holds that $\mathbf{E} \left[ F\left(X\right) \right] \leqslant \mathbf{E} \left[ F\left(Y\right) \right]$. (Actually, a result of Strassen establishes that this characterisation is tight, i.e. being convex-ordered also implies the martingale property.)
In particular, letting $F \left( x \right) = \exp \left( t \cdot x \right)$ for $t \in \mathbf{R}$, it holds that the moment-generating function of $X$ is dominated by the moment-generating function of $Y = M \left( \mathbf{E} \left[ X \right]; a, b \right)$. Because $Y$ is a two-point distribution, its MGF is thus relatively easy to control. Indeed, the MGF of $Y$ is essentially that of a standard biased coin flip (up to shifting and scaling). Using basic arguments from the theory of exponential families, one sees that
\begin{align}
\left(\log M_{Y}\right)^{''}\left(t\right)\leqslant\frac{\left(b-a\right)^{2}}{4}
\end{align}
uniformly in $t$, from which one can bound
\begin{align}
M_{Y}\left(t\right) \leqslant \exp \left(t \cdot \mathbf{E} \left[X\right] + \frac{t^{2}}{8} \cdot \left(b-a\right)^{2} \right).
\end{align}
One can then see that
\begin{align}
M_{X} \left(t\right) \leqslant M_{Y} \left(t\right) \leqslant \exp \left(t \cdot \mathbf{E} \left[X\right] + \frac{t^{2}}{8} \cdot \left(b-a\right)^{2}\right),
\end{align}
and this is essentially the content of Hoeffding's lemma, from which one often deduces Hoeffding's (concentration) inequality.
For me, this is nice, because it highlights that the inequality is essentially a comparison which says that "bounded RVs fluctuate less than a (n appropriate) 2-point random RV". The specifics of the MGF and the tail inequality are the practically relevant consequences of this.
A potentially interesting variant of this idea would be to consider a construction where $Y$ takes values on e.g. the set $\left[ a, s \right] \cup \left[ t, b \right]$ for some $a<s<t<b$. A similar ordering structure ought to be present, and perhaps there is some nice interpolation result along these lines.
For what it's worth, there is also a construction in [this paper](https://arxiv.org/abs/1909.12067) which is not dissimilar to this idea (but is certainly more ambitious in its scope than Hoeffding's lemma!).