Computation of Heavy-Tailed Measures

###### tags: `one-offs` `heavy tails` # Computation of Heavy-Tailed Measures **Overview**: In this note, I describe some of the difficulties involved in computation (sampling, approximate inference) in the face of heavy tails. I do not suggest solutions per se, but try to highlight that certain formulations of a given heavy-tailed problem are more informative than others, and may make the problem easier to solve. ## Premise I am increasingly developing the understanding that { approximate inference, sampling, etc. } of heavy-tailed measures is a very tricky area! in some senses, it's quite a bit more complicated than e.g. multimodality (though not necessarily harder to do well, per se). Somehow, it seems more subtle to specify the manner in which a problem is heavy tailed, whereas there are few common distinct ways in which to be multimodal. As such, developing { theory / analysis / etc. } for algorithms which tackle these problems seems appears to require being a bit more precise about the nature of the problem. ## Examples To comment a bit more closely on the possibilities, heavy-tailed measures can have tails which are: (writing $\gamma$ for their density with respect to Lebesgue measure) * of polynomial decay (e.g. $\gamma\left(x\right)\in\Theta\left(\left|x\right|^{-\left(d+\alpha\right)}\right)$ for some $\alpha>0$), * of quasi-exponential decay (e.g. $\gamma\left(x\right)\in\exp\left(-\Theta\left(\left|x\right|^{\eta}\right)\right)$ for some $\eta\in\left(0,1\right)$), * any of the above, but with additional logarithmic factors in various places, * rotationally symmetric, or approximately so, * product form, or approximately so, * decaying at different rates, in different directions, * and more. By contrast, in the multimodal setting, the standing assumptions seem to usually be that (roughly speaking) 1. The tails are not an issue, i.e. the target is well-confined, 2. There are a 'relatively finite' number of modes (otherwise the problem may just be impossible), 3. Locally, each mode is not too hard to sample from (or approximate, etc.), and 4. Some sort of 'temperature adjustment' strategy should allow for information to be transferred between modes. There are some subtleties related to the differences between 'energetic' and 'entropic' barriers in state spaces, but since multimodality is typically understood to relate to energetic barriers, this is a slightly different problem - more about correctly diagnosing what the issue is, rather than knowing how to solve it. ## What could help? For approximate inference in heavy-tailed models, my sense is that in order to succeed, one needs to have a pretty good idea of how heavy the tails are a priori, i.e. polynomial tails demand different strategies than quasi-exponential tails. One can see this by imagining strategies which perform a nonlinear change of coordinates to 'compress' the target; if you mis-estimate the thickness of the tails, you could end up either (for example) i) not making a dent on their thickness, ii) over-correcting, and creating dangerously light tails, iii) introducing heterogeneity of tail behaviour in different directions, and so on. Similar comments apply to other natural strategies for navigating heavy tails. Another comment is that there is just generally not as much work on this topic at present. In cases where the heaviness of the tails is reasonably well-understood, it can be straightforward to reparametrise things favourably (this is well-known in the case of e.g. scale mixtures of Gaussians). The more black-box setting presently appears to be far more open. Given the richness of strategies for handling multimodality (e.g. in the MCMC world, there might be up to ten different ways of using 'tempering'), it would be nice to see a similar line of work approach for heavy-tailedness.