Try   HackMD
tags: one-offs optimisation smoothness

Convex Taylor Polynomials

Overview: In this note, I discuss a nice result about Taylor expansions of convex functions, and their convexity.

From Smoothness to Majorants

There's a cool result from a couple of years ago which I think is really neat, and the proof is simple enough that I think it's worth presenting widely. The topic is approximating convex functions with (regularised) Taylor polynomials, and (near-)preservation of convexity under taking Taylor expansions.

To begin with: it is not so hard to see that Taylor approximations of convex functions can easily fail to be convex: consider any convex function with non-vanishing third derivative, and take its third-order Taylor polynomial. As such, if we are interested in preserving convexity, it is natural to consider constraints and regularisation.

Focus now on convex functions with whose derivatives of order p are Lipschitz continuous, i.e.

(Dpf)(x)(Dpf)(y)opLxy2.

This is a sensible constraint to impose on

f, in the sense that such functions will deviate from local approximations by controllable amounts.

Fortunately, this constraint implies also a number of additional bounds on the growth of f by integration. For example, the difference between

(Dpkf)(y) and the corresponding derivative of its Taylor approximation from
x
at
y
can be bounded in operator norm as
Lxy2k+1(k+1)!
. In particular, taking
k=p2
, we see that the Hessian of
f
at
y
can be bounded from above (in semidefinite order) by the Hessian of the
p
th-order Taylor approximation at
x
, plus
Lxy2p1(p1)!Id
.

Separately, one can prove that the Hessian of the mapping

ypLxy2p+1(p+1)! is bounded from below (again in semidefinite order) by this same matrix. As such, one can deduce that the Hessian of the regularised Taylor polynomial

yT(y;x,p,f)+pLxy2p+1(p+1)!

dominates the Hessian of

f, where
T
is the relevant Taylor polynomial. Recalling now that
f
is convex, it thus follows that this approximation is also convex! Moreover, it is also an upper bound to
f
. This property actually holds already for

yT(y;x,p,f)+Lxy2p+1(p+1)!

As a consequence, we see that given a smooth, convex function

f, it is feasible to form an approximation to it at a point
x
, with high-order accuracy in the neighbourhood of
x
, and global convexity. The practical implication of this is that one can construct improved optimisation methods based on these high-order majorants, with realistic hopes of minimising these majorants, by virtue of convexity.

The main result here is given in Nesterov's paper, and the algorithmic implications are explored in a number of related works by the same author and coauthors.