The Advent of Optimisation in Statistics

###### tags: `one-offs` `history` # The Advent of Optimisation in Statistics **Overview**: I asked the following historical question online: “It is fairly well-documented that the turning point for the Bayesian statistics community towards widespread use of MCMC was in the early 1990s, in light of papers on Gibbs sampling e.g. Gelfand and Smith, Smith and Roberts, and others. Is there a comparable landmark paper (or otherwise) which is historically associated with the advent of { nonlinear, convex, etc. } optimisation in modern statistics?” I received some interesting responses, which I document (with some editing) in this note. ## Responses - “Surely it would be something from the advent of Support Vector Machines. Ideas from $\ell^{1}$ regularisation and similar were of course huge, but tools from convex optimisation (e.g. duality and related notions) were certainly understood and used before the sparsity era.” - “Perhaps it occurred somewhere in the transition from linear models to generalised linear models. The initial focus on linear models arguably allowed much of the field to avoid more general optimisation problems, but with the increasing popularity of non-linear link functions, this would have had to become more difficult. Still, GLMs were (and perhaps still are) nearly synonymous with Iteratively-Reweighted Least Squares, and perhaps this weakens the connection to more general convex optimisation.” - “The 1977 paper of Dempster, Laird and Rubin on the EM Algorithm might have been one of the earlier works to focus on maximum likelihood, rather than minimum least squares, as an optimisation / estimation principle. However, the EM algorithm is not strictly in the spirit of other procedures from within the optimisation world (at least without a bit of massaging).” - “Conjecturally, Huber's 1964 paper on 'Robust Estimation of a Location Parameter', which introduced the notion of M-estimation as a unified way of understanding a large class of statistical estimators, may have played a role in bringing more general optimisation formulations of estimators into the community's consciousness.” - “Tibshirani's Lasso paper of 1996 jumps to mind, and Breiman's 1995 Garrote paper is an even earlier work in the same vein. In both papers, the authors discuss solving linearly constrained quadratic programs with iterative algorithms. But perhaps these are not the first such works?” - “One could also add to these the series of papers by Chen & Donoho on basis pursuit (between ~ 1994-2001, say).” It would still be interesting for me to hear more on this topic; I feel that the above gives many useful pointers but certainly not an unambiguous historical answer.

Read more

Denoising Diffusions

Nested Structure in MCMC Algorithms

The "Approximation to What?" Principle

Hoeffding's Inequality by Convex Ordering