###### tags: `one-offs` `inequalities`
# Why not Orlicz Norms?
**Overview**: The following note gives a brief overview of Orlicz norms, why they are sometimes useful to work with, and why they might deserve 'a seat at the table' with $L^p$ norms.
## Introduction
I recently came across a question on MathOverflow in which the asker was curious about the significance of $L^p$ spaces outside of 'the usual suspects' of $p \in \{ 1, 2, \infty \}$. This is a question I always like asking, because people tend to come out with cool examples for which other $p$ are relevant and even insightful.
Anyways, in the comments section, a commenter made a point to the effect of 'it is not just these other values of $p$ which we neglect, but other norms and Banach spaces at large'. They specifically mentioned Orlicz spaces as an example of spaces which are in a similar vein to $L^p$ spaces, but maybe not as popular to work with.
From my point of view, the neglect is relatively transparent: theorists like to work with objects which have convenient properties, and for which it is straightforward to formulate positive assumptions. Still, I find these other spaces a bit fun to play with, and for many results of interest, the generalisation from $L^p$ to Orlicz spaces is not much extra work. With this in mind, I wanted to write a bit about Orlicz spaces.
## Definitions
Let $\Psi$ denote the space of positive, increasing, convex functions $\psi: [0, \infty) \to [0, \infty)$ which satisfy $\psi(0)=0$. The simplest examples are to think of $\psi(x) = |x|^p$ for $p \geqslant 1$. For a bit more spice, take $\psi_a(x) = \exp(x^a) - 1$ for $a \geqslant 1$. There are many other possibilities as well, though I have not seen them so much in practice.
Fixing a probability measure $\mu$ and a function $\psi \in \Psi$, the $\psi$-Orlicz space over $\mu$ can be viewed as the linear span of functions $f$ such that $\int \mu(\mathrm{d} x) \cdot \psi( | f(x) | ) < \infty$. There are a few ways to define a norm on this space; the one which I typically find easiest to work with is
\begin{align}
\| f \|_{L^\psi ( \mu)} := \inf \left\{ t > 0 : \int \mu(\mathrm{d} x) \cdot \psi \left( \frac{| f(x) |}{t} \right) \leqslant 1 \right\}.
\end{align}
It is a nice exercise to convince oneself that this is indeed a norm, as this allows one to see where some of the assumptions on $\psi$ come from. Note that for $\psi(x) = |x|^p$, this recovers the usual definition of $L^p ( \mu)$, so we seem to have meaningfully generalised the notion.
## Applications
A basic application of Orlicz norms is to estimate tail probabilities of random variables. Roughly speaking, if $f$ has a finite $L^\psi$ norm, then $\psi \circ |f|$ is integrable, and so the tails of $f$ should decay like $1/\psi$. Indeed, write
\begin{align}
\mathbf{P} \left( |f(X)| \geqslant t \right) &= \int \mu(\mathrm{d} x) \cdot \mathbf{I} \left[ \frac{| f(x) |}{\| f \|_\psi} \geqslant \frac{t}{\| f \|_\psi} \right] \\
&\leqslant \int \mu(\mathrm{d} x) \cdot \frac{\psi \left( \frac{| f(x) |}{\| f \|_\psi} \right)}{\psi \left( \frac{t}{\| f \|_\psi} \right)} \\
&\leqslant \psi \left( \frac{t}{\| f \|_\psi} \right)^{-1}.
\end{align}
One can basically recognise this as Markov's inequality.
There are also particularly clean ways in which to bound the Orlicz norm of a maximum of independent and identically-distributed random variables, where having a finer understanding of the tail properties can pay off nicely.
## Truncation Arguments
(this example is more or less developed in [this paper](https://perso.math.univ-toulouse.fr/cattiaux/files/2013/11/alea.pdf), which is **not** mine)
I recently had cause to use Orlicz norms in the following context: I had access to a functional inequality which only held for bounded functions, but wanted to derive consequences for unbounded functions.
Roughly speaking, I was working with a linear operator $P$ on a Hilbert space of functions $L^2 ( \mu )$, which satisfies $\| P h \|_2 \leqslant \| h \|_2$ for all $h \in L^2 ( \mu )$, and $\| P h - \mu(h) \|_2 \leqslant c \cdot \| h \|_{\mathrm{osc}}$ for all bounded $h$ and some small constant $c$.
The hope was then that for $h \in \mathcal{H}$ which are unbounded but have thin tails (in an appropriate sense), a qualitatively similar bound ought to hold.
For now, let us work with positive $h$, and attempt a truncation argument, i.e. fix some $H > 0$, and write
\begin{align}
h &= h \cdot \mathbf{I} \left[ h \leqslant H \right] + h \cdot \mathbf{I} \left[ h > H \right] \\
&=: h_{\leqslant H} + h_{>H}.
\end{align}
Now, decompose
\begin{align}
\| P h - \mu(h) \|_2 &\leqslant \| Ph - Ph_{\leqslant H} \|_2 \\
&+ \| Ph_{\leqslant H} - \mu(h_{\leqslant H}) \|_2 \\
&+ \| \mu(h_{\leqslant H}) - \mu(h) \|_2.
\end{align}
We can always bound the second term as
\begin{align}
\| Ph_{\leqslant H} - \mu(h_{\leqslant H}) \|_2 \leqslant c \cdot \| h_{\leqslant H} \|_{\mathrm{osc}} \leqslant c \cdot H.
\end{align}
Rewrite the first term as $\| Ph - Ph_{\leqslant H} \|_2 = \| P h_{> H} \|_2 \leqslant \| h_{> H} \|_2$, and the third term as $\| \mu(h_{\leqslant H}) - \mu(h) \|_2 = | \mu ( h_{>H} ) | \leqslant \| h_{> H} \|_2$. To control these terms, we need to understand the tails of $h$.
To this end, assume that $h$ has a finite $\psi$-Orlicz norm for some $\psi \in \Psi$ which grows faster than quadratically, i.e. such that $x \mapsto x^{-2} \psi (x)$ is increasing. Then it holds that
\begin{align}
\| h_{> H} \|_2^2 &= \int \mu ( \mathrm{d} x) \cdot h(x)^2 \cdot \mathbf{I} \left[ h > H \right] \\
&\leqslant H^2 \cdot \psi \left( \frac{H}{\| h \|_\psi} \right)^{-1} \cdot \int \mu ( \mathrm{d} x) \cdot \psi \left( \frac{ |h(x)| }{\| h \|_\psi} \right) \\
&\leqslant H^2 \cdot \psi \left( \frac{H}{\| h \|_\psi} \right)^{-1}.
\end{align}
Writing $H = \| h \|_\psi \cdot t$, it thus holds that
\begin{align}
\| P h - \mu(h) \|_2 \leqslant \inf_{t > 0} \left\{ c \cdot t + 2 \cdot t \cdot \psi(t)^{-1/2}\right\} \cdot \| h \|_\psi,
\end{align}
and so by setting the cutoff $H$ (equivalently, $t$) appropriately, one can estimate the operator norm of $P - \mu$ from $L^\psi$ to $L^2$.
For a generic estimate, take $t = \psi^{-1} ( 1 / \delta^2 \cdot c^2)$ to upper bound the infimum $(1 + 2 \cdot \delta) \cdot c \cdot \psi^{-1} ( 1 / \delta^2 \cdot c^2)$. In the $L^p$ case, taking $\delta = 1$ thus yields an estimate of $3 \cdot c^{1-2/p}$, which is not too bad. With $\psi (x) = \exp (x) - 1$, this improves to $3 \cdot c \cdot \log \left(1 + \frac{1}{c^2} \right)$.
Actually, the most annoying part for me is that when $\psi(x) = x^2$, I pick up this weird factor of $3$ out the front, when I know that I should be able to put a $1$ there. Even with smarter optimisation over $t$, I can only get it down to a $2$, so the issue comes earlier; I think that perhaps this loss is inevitable if one uses the triangle inequality right at the beginning.
## Some Other Stuff
There are a number of nice interpolation inequalities involving $L^p$ norms, i.e. since when $1 \leqslant p < q < r$, it holds on $\mathbf{R}$ that $|x|^q \lesssim |x|^p + |x|^r$, it is not so difficult to find bounds of the form $| x |_q \leqslant F_{q \mid p, r} \left( \| x \|_p, \| x \|_r \right)$. I expect that a similar story can be told for Orlicz norms, but I'm still finding my feet a bit.
There are also a bunch of nice functional inequalities (e.g. generalised Sobolev inequalities) which can be Orlicz-ified; these can be used to gain a finer understanding of how different Markov processes converge to equilibrium. For readers who are familiar with the notion of hypercontractivity, there is also an Orlicz version of that.
## Conclusion
For me, I am still not often interested in particular non-standard $L^p$ norms, or in other more exotic Orlicz norms. However, when developing mathematical techniques or inequalities which work in *one* of these settings, I do feel that it is useful exercise to see whether they might actually work in *all* of those spaces. I get the sense that it provides a fuller picture of what is going on in a problem, and how finely the conclusions depend on e.g. tail behaviours. I'm going to try to retain it as a habit, at least for a bit.