###### tags: `one-offs` `heavy tails`
# Computation of Heavy-Tailed Measures
**Overview**: In this note, I describe some of the difficulties involved in computation (sampling, approximate inference) in the face of heavy tails. I do not suggest solutions per se, but try to highlight that certain formulations of a given heavy-tailed problem are more informative than others, and may make the problem easier to solve.
## Premise
I am increasingly developing the understanding that { approximate inference, sampling, etc. } of heavy-tailed measures is a very tricky area! in some senses, it's quite a bit more complicated than e.g. multimodality (though not necessarily harder to do well, per se). Somehow, it seems more subtle to specify the manner in which a problem is heavy tailed, whereas there are few common distinct ways in which to be multimodal. As such, developing { theory / analysis / etc. } for algorithms which tackle these problems seems appears to require being a bit more precise about the nature of the problem.
## Examples
To comment a bit more closely on the possibilities, heavy-tailed measures can have tails which are: (writing $\gamma$ for their density with respect to Lebesgue measure)
* of polynomial decay (e.g. $\gamma\left(x\right)\in\Theta\left(\left|x\right|^{-\left(d+\alpha\right)}\right)$ for some $\alpha>0$),
* of quasi-exponential decay (e.g. $\gamma\left(x\right)\in\exp\left(-\Theta\left(\left|x\right|^{\eta}\right)\right)$ for some $\eta\in\left(0,1\right)$),
* any of the above, but with additional logarithmic factors in various places,
* rotationally symmetric, or approximately so,
* product form, or approximately so,
* decaying at different rates, in different directions,
* and more.
By contrast, in the multimodal setting, the standing assumptions seem to usually be that (roughly speaking)
1. The tails are not an issue, i.e. the target is well-confined,
2. There are a 'relatively finite' number of modes (otherwise the problem may just be impossible),
3. Locally, each mode is not too hard to sample from (or approximate, etc.), and
4. Some sort of 'temperature adjustment' strategy should allow for information to be transferred between modes.
There are some subtleties related to the differences between 'energetic' and 'entropic' barriers in state spaces, but since multimodality is typically understood to relate to energetic barriers, this is a slightly different problem - more about correctly diagnosing what the issue is, rather than knowing how to solve it.
## What could help?
For approximate inference in heavy-tailed models, my sense is that in order to succeed, one needs to have a pretty good idea of how heavy the tails are a priori, i.e. polynomial tails demand different strategies than quasi-exponential tails. One can see this by imagining strategies which perform a nonlinear change of coordinates to 'compress' the target; if you mis-estimate the thickness of the tails, you could end up either (for example) i) not making a dent on their thickness, ii) over-correcting, and creating dangerously light tails, iii) introducing heterogeneity of tail behaviour in different directions, and so on. Similar comments apply to other natural strategies for navigating heavy tails.
Another comment is that there is just generally not as much work on this topic at present. In cases where the heaviness of the tails is reasonably well-understood, it can be straightforward to reparametrise things favourably (this is well-known in the case of e.g. scale mixtures of Gaussians). The more black-box setting presently appears to be far more open. Given the richness of strategies for handling multimodality (e.g. in the MCMC world, there might be up to ten different ways of using 'tempering'), it would be nice to see a similar line of work approach for heavy-tailedness.

Overview: In this note, I log some basic observations about diffusion-based generative models.

8/14/2023Overview: In this note, I describe some aspects of hierarchical structure in MCMC algorithms, and how they can be of theoretical and practical relevance.

8/9/2023Overview: In this note, I discuss a recurrent question which can be used to generate research questions about methods of all sorts. I then discuss a specific instance of how this question has proved fruitful in the theory of optimisation algorithms. Methods and Approximations A nice story is that when Brad Efron derived the bootstrap, it was done in service of the question “What is the jackknife an approximation to?”. I can't help but agree that there's something quite exciting about research questions which have this same character of ''What is (this existing thing) an approximation to?''. One bonus tilt on this which I appreciate is that there can be multiple levels of approximation, and hence many answers to the same question. One well-known example is gradient descent, which can be viewed as an approximation to the proximal point method, which can then itself be viewed as an approximation to a gradient flow. There are probably even more stops along the way here. In this case, there is even the perspective that from the perspective of mathematical theory, there may be at least as much to be gained by stopping off at the proximal point interpretation, as there is from the gradient flow perspective. My experience is that generalist applied mathematicians get to grips with the gradient flow quickly, but optimisation theorists can squeeze more out of the PPM formulation. There is thus some hint that using this 'intermediate' approximation can be particularly insightful in its own right. It would be interesting to collect more examples with this character.

5/22/2023Overview: In this note, I prove Hoeffding's inequality from the perspectives of martingales and convex ordering. The Basic Construction Let $-\infty<a<x<b<\infty$, and define a random variable $M$ with law $M\left(x;a,b\right)$ by \begin{align} M=\begin{cases} a & \text{w.p. }\frac{b-x}{b-a}\ b & \text{w.p. }\frac{x-a}{b-a}. \end{cases}

5/22/2023
Published on ** HackMD**