Overview: In this note, I log some basic observations about diffusion-based generative models.
8/14/2023Overview: In this note, I discuss a recurrent question which can be used to generate research questions about methods of all sorts. I then discuss a specific instance of how this question has proved fruitful in the theory of optimisation algorithms. Methods and Approximations A nice story is that when Brad Efron derived the bootstrap, it was done in service of the question “What is the jackknife an approximation to?”. I can't help but agree that there's something quite exciting about research questions which have this same character of ''What is (this existing thing) an approximation to?''. One bonus tilt on this which I appreciate is that there can be multiple levels of approximation, and hence many answers to the same question. One well-known example is gradient descent, which can be viewed as an approximation to the proximal point method, which can then itself be viewed as an approximation to a gradient flow. There are probably even more stops along the way here. In this case, there is even the perspective that from the perspective of mathematical theory, there may be at least as much to be gained by stopping off at the proximal point interpretation, as there is from the gradient flow perspective. My experience is that generalist applied mathematicians get to grips with the gradient flow quickly, but optimisation theorists can squeeze more out of the PPM formulation. There is thus some hint that using this 'intermediate' approximation can be particularly insightful in its own right. It would be interesting to collect more examples with this character.
5/22/2023Overview: In this note, I prove Hoeffding's inequality from the perspectives of martingales and convex ordering. The Basic Construction Let $-\infty<a<x<b<\infty$, and define a random variable $M$ with law $M\left(x;a,b\right)$ by \begin{align} M=\begin{cases} a & \text{w.p. }\frac{b-x}{b-a}\ b & \text{w.p. }\frac{x-a}{b-a}. \end{cases}
5/22/2023Overview: I asked the following historical question online: “It is fairly well-documented that the turning point for the Bayesian statistics community towards widespread use of MCMC was in the early 1990s, in light of papers on Gibbs sampling e.g. Gelfand and Smith, Smith and Roberts, and others. Is there a comparable landmark paper (or otherwise) which is historically associated with the advent of { nonlinear, convex, etc. } optimisation in modern statistics?” I received some interesting responses, which I document (with some editing) in this note. Responses “Surely it would be something from the advent of Support Vector Machines. Ideas from $\ell^{1}$ regularisation and similar were of course huge, but tools from convex optimisation (e.g. duality and related notions) were certainly understood and used before the sparsity era.” “Perhaps it occurred somewhere in the transition from linear models to generalised linear models. The initial focus on linear models arguably allowed much of the field to avoid more general optimisation problems, but with the increasing popularity of non-linear link functions, this would have had to become more difficult. Still, GLMs were (and perhaps still are) nearly synonymous with Iteratively-Reweighted Least Squares, and perhaps this weakens the connection to more general convex optimisation.” “The 1977 paper of Dempster, Laird and Rubin on the EM Algorithm might have been one of the earlier works to focus on maximum likelihood, rather than minimum least squares, as an optimisation / estimation principle. However, the EM algorithm is not strictly in the spirit of other procedures from within the optimisation world (at least without a bit of massaging).”
5/22/2023or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up