one-offs
optimisation
smoothness
Overview: In this note, I discuss a nice result about Taylor expansions of convex functions, and their convexity.
There's a cool result from a couple of years ago which I think is really neat, and the proof is simple enough that I think it's worth presenting widely. The topic is approximating convex functions with (regularised) Taylor polynomials, and (near-)preservation of convexity under taking Taylor expansions.
To begin with: it is not so hard to see that Taylor approximations of convex functions can easily fail to be convex: consider any convex function with non-vanishing third derivative, and take its third-order Taylor polynomial. As such, if we are interested in preserving convexity, it is natural to consider constraints and regularisation.
Focus now on convex functions with whose derivatives of order p are Lipschitz continuous, i.e.
This is a sensible constraint to impose on , in the sense that such functions will deviate from local approximations by controllable amounts.
Fortunately, this constraint implies also a number of additional bounds on the growth of f by integration. For example, the difference between and the corresponding derivative of its Taylor approximation from at can be bounded in operator norm as . In particular, taking , we see that the Hessian of at can be bounded from above (in semidefinite order) by the Hessian of the th-order Taylor approximation at , plus .
Separately, one can prove that the Hessian of the mapping is bounded from below (again in semidefinite order) by this same matrix. As such, one can deduce that the Hessian of the regularised Taylor polynomial
dominates the Hessian of , where is the relevant Taylor polynomial. Recalling now that is convex, it thus follows that this approximation is also convex! Moreover, it is also an upper bound to . This property actually holds already for
As a consequence, we see that given a smooth, convex function , it is feasible to form an approximation to it at a point , with high-order accuracy in the neighbourhood of , and global convexity. The practical implication of this is that one can construct improved optimisation methods based on these high-order majorants, with realistic hopes of minimising these majorants, by virtue of convexity.
The main result here is given in Nesterov's paper, and the algorithmic implications are explored in a number of related works by the same author and coauthors.