###### tags: `one-offs` `convexity`
# Strong Convexity and Square Roots
**Overview**: In this note, I describe a small problem concerning the convexity of square roots of strongly-convex functions, a negative resolution to the main problem, and a partial positive resolution to a related problem
## The Problem
Some months ago, I was curious about the following question: suppose that $F$ is a positive, strongly-convex function. Does it necessarily follow that $F^{1/2}$ is a convex function? My sense was that the answer was probably no, but that some modification of the claim could be true. There is a converse of sorts which intuitively true: let $f\geqslant0$ be convex and coercive; then $F=f^{2}$ is convex, and has at least quadratic growth in the tails (which is not the same as strong convexity, but quite close in the convex setting).
As it turns out, the answer is indeed no; an explicit counterexample was presented to me in this [post](https://twitter.com/zschutzy/status/1572954660696371203). Here, I give details for a slightly more general counterexample, which clarifies that convexity of $F^{1/2}$ can fail to hold on a very large set.
## The Counter-Example
Consider $F_{\theta}\left(x\right)=\frac{1}{2}\cdot x^{2}+\theta\cdot\cos x$. For $\theta\in\left(0,1\right)$, it holds that $F_{\theta}\left(x\right)\geqslant F_{\theta}\left(0\right)=\theta>0$, and $D^{2}F_{\theta}\geqslant1-\theta>0$, so $F_{\theta}$ is strongly convex and strictly positive. We will show that the second derivative of $F_{\theta}^{1/2}$ can be strictly negative, and hence that $F_{\theta}^{1/2}$ cannot be everywhere convex.
Indeed, compute that
\begin{align}
\mathrm{D}\left(F_{\theta}^{1/2}\right) &=\frac{1}{2}\cdot F_{\theta}^{-1/2}\cdot\mathrm{D}F_{\theta} \\
&=\frac{1}{2}\cdot F_{\theta}^{-1/2}\cdot\left(x-\theta\cdot\sin x\right),
\end{align}
and
\begin{align}
\mathrm{D}^{2}\left(F_{\theta}^{1/2}\right) &=\mathrm{D}\left\{ \frac{1}{2}\cdot F_{\theta}^{-1/2}\cdot\left(x-\theta\cdot\sin x\right)\right\} \\
&=-\frac{1}{4}\cdot F_{\theta}^{-3/2}\cdot\left(x-\theta\cdot\sin x\right)^{2}+\frac{1}{2}\cdot F_{\theta}^{-1/2}\cdot\left(1-\theta\cdot\cos x\right) \\
&=-\frac{1}{4}\cdot F_{\theta}^{-3/2}\cdot\left\{ \left(x-\theta\cdot\sin x\right)^{2}-2\cdot F_{\theta}\cdot\left(1-\theta\cdot\cos x\right)\right\} .
\end{align}
This will be negative precisely when
\begin{align}
\left(x-\theta\cdot\sin x\right)^{2} &>2\cdot F_{\theta}\cdot\left(1-\theta\cdot\cos x\right) \\
&=\left(x^{2}+2\cdot\theta\cdot\cos x\right)\cdot\left(1-\theta\cdot\cos x\right) \\
\iff\quad x^{2}-2\cdot\theta\cdot\sin x+\theta^{2}\cdot\sin^{2}x &>x^{2}+\left(2-x^{2}\right)\cdot\theta\cdot\cos x-2\cdot\theta^{2}\cdot\cos^{2}x \\
\iff\quad\theta\cdot\left(\sin^{2}x+2\cdot\cos^{2}x\right) &>2\cdot\sin x+\left(2-x^{2}\right)\cdot\cos x.
\end{align}
Consider now values of $x$ such that $x^{2}>2$ and $\sin x<0<\cos x$, which occurs 'about a quarter of the time' for large $x$. For such points, it holds that
\begin{align}
\theta\cdot\left(\sin^{2}x+2\cdot\cos^{2}x\right)=\theta\cdot\left(1+\cos^{2}x\right)\geqslant\theta>0
\end{align}
and that
\begin{align}
2\cdot\sin x+\left(2-x^{2}\right)\cdot\cos x <2\cdot0+0\cdot0=0.
\end{align}
One thus deduces that $D^{2}F_{\theta}^{1/2}<0$ infinite often, and that $F_{\theta}$ is even locally strongly concave on some sub-interval of $2\pi\cdot\left[n-\frac{1}{4},n\right]$ for every sufficiently large integer $n$.
## A Partial Positive Result
One partial result in the positive direction is as follows: first, for a positive, strictly convex $F$, compute explicitly that
\begin{align}
\frac{\mathrm{D}F^{1/2}}{F^{1/2}} &=\frac{1}{2}\cdot\frac{\mathrm{D}F}{F} \\
\frac{\mathrm{D}^{2}F^{1/2}}{F^{1/2}} &=\frac{1}{4}\cdot\left\{ 2\cdot\frac{\mathrm{D}^{2}F}{F}-\left(\frac{\mathrm{D}F}{F}\right)^{2}\right\} ,
\end{align}
Consider now a fixed value $x$, and replacing $F$ by $F_{c}:=F+c$ for a large positive constant $c$. It then holds that
\begin{align}
\frac{\left(\mathrm{D}^{2}F_{c}^{1/2}\right)\left(x\right)}{F_{c}^{1/2}\left(x\right)} &=\frac{1}{4}\cdot\left\{ 2\cdot\frac{\left(\mathrm{D}^{2}F\right)\left(x\right)}{F\left(x\right)+c}-\left(\frac{\mathrm{D}F\left(x\right)}{F\left(x\right)+c}\right)^{2}\right\} \\
&=2\cdot c^{-1}\cdot\left(\mathrm{D}^{2}F\right)\left(x\right)+\mathcal{O}\left(c^{-2}\right)
\end{align}
as $c$ grows. This can be made positive by taking $c$ to be sufficiently large. As such, for any strongly convex $F$, and every $\hat{x}\in\mathrm{dom}F$, there exists $c=c\left(\hat{x}\right)>0$ such that $x\mapsto\left(F\left(x\right)+c\right)^{1/2}$ is convex in a neighbourhood of $\hat{x}$. It is not obvious that one can pick a $c$ which works simultaneously for all $\hat{x}$, and indeed, it seems somewhat likely that counterexamples could exist in general. This construction does not particularly appear to benefit from strong convexity over e.g. strict convexity.
## Conclusion
I am still not fully convinced that a better positive result might not be available. For strongly convex $F$, should there exist $c > 0$ such that $x\mapsto\left(F\left(x\right)+c\right)^{1/2}$ is globally convex? What about if $F$ also satisfies an appropriate smoothness or growth condition? What can be said about the convexity of $\alpha \circ F$ for other concave, increasing functions $F$? At the core: what are the obstructions to convexity under composition with 'benign' concave functions?
The question itself is not so important, but its resolution could nevertheless allow for some conceptually neat simplifications elsewhere.

Overview: In this note, I log some basic observations about diffusion-based generative models.

8/14/2023Overview: In this note, I describe some aspects of hierarchical structure in MCMC algorithms, and how they can be of theoretical and practical relevance.

8/9/2023Overview: In this note, I discuss a recurrent question which can be used to generate research questions about methods of all sorts. I then discuss a specific instance of how this question has proved fruitful in the theory of optimisation algorithms. Methods and Approximations A nice story is that when Brad Efron derived the bootstrap, it was done in service of the question “What is the jackknife an approximation to?”. I can't help but agree that there's something quite exciting about research questions which have this same character of ''What is (this existing thing) an approximation to?''. One bonus tilt on this which I appreciate is that there can be multiple levels of approximation, and hence many answers to the same question. One well-known example is gradient descent, which can be viewed as an approximation to the proximal point method, which can then itself be viewed as an approximation to a gradient flow. There are probably even more stops along the way here. In this case, there is even the perspective that from the perspective of mathematical theory, there may be at least as much to be gained by stopping off at the proximal point interpretation, as there is from the gradient flow perspective. My experience is that generalist applied mathematicians get to grips with the gradient flow quickly, but optimisation theorists can squeeze more out of the PPM formulation. There is thus some hint that using this 'intermediate' approximation can be particularly insightful in its own right. It would be interesting to collect more examples with this character.

5/22/2023Overview: In this note, I prove Hoeffding's inequality from the perspectives of martingales and convex ordering. The Basic Construction Let $-\infty<a<x<b<\infty$, and define a random variable $M$ with law $M\left(x;a,b\right)$ by \begin{align} M=\begin{cases} a & \text{w.p. }\frac{b-x}{b-a}\ b & \text{w.p. }\frac{x-a}{b-a}. \end{cases}

5/22/2023
Published on ** HackMD**