BayesComp 2023 Recap

# BayesComp 2023 Recap ## General Trends First, some overview. 1. There was plenty of MCMC, as to be expected. 2. There was more SMC and SSMs than I expected. 3. There was less Variational Inference than I expected. 4. ABC seems not to be on the way out, and people are still trying new things. 5. There seemed to be very healthy levels of Model Misspecification-awareness. 6. Despite recent hype, there was not too much about Diffusion Models (though perhaps next time this will change). 7. The poster session was excellent, though there was too much going on for me to reasonably discuss it here. Any mismatch between my expectations and the reality may be attributed to this being my first BayesComp (perhaps I had an inaccurate perception of the tastes within the community), and is not too serious. The remainder of the recap will be quite informal, and not very structured (aside from being more-or-less chronological). I will limit myself to commenting on talks which I attended, which means I will largely gloss over the social aspects of the conference. (I will mention briefly that it was terrific in this regard, and I was very lucky to meet with many fellow attendees with whom I had previously only been able to interact online; this was very nourishing at a human level.) I did not attend every talk which would have been interesting to me (at a conference which is this well-suited to my own interests, it would have been impossible without some degree of time-travel; this is without even mentioning the fatigue of attending so many talks in one day), so the omission of talks from this list is hardly a strike against them. The comments will not really read as an advertisement for the paper; they are really just a précis of what I have managed to remember about the talk, and what the talk brought to mind for me. Many of them will correspond to papers about which I have already tweeted, etc. anyways. If you are curious about any of these works, or want to get my impression of them a bit more, do let me know; I found them interesting enough to write about here, so you can trust that I would be interested enough to carry on a conversation about them. ## Tuesday (Satellite Sessions, Day 2/2) ### Bayesian Inference of Epidemics – Fast inference and model selection on epidemiological models using model-based proposals. The speaker presented some methods for inference in epidemiological models, based around the use of carefully-designed MCMC proposal moves which perturb both parameter values and trajectories. This allows for improved exploration of the state space. Of course, the viability of such a strategy requires some tractability in model structure, but there are signs that the authors have made substantial efforts to make the methodology about as general as possible. One expects that it could be useful at quite some scale. – Data Augmentation for MCMC in the Stochastic SIR Model. I had seen some version of this talk before; again it is some model-specific approach towards defining scalable MCMC proposals in compartmental models. I can't reproduce too many of the details here, but the constructions are very nice. An interesting point, which is maybe under-appreciated as somebody from outside of the area, is that the observation models in epidemiology make a huge difference to which sorts of approximations are useful. – Approximate Inference in Stochastic Epidemic Models via Systematic Parametric Approximations (Poisson and Multinomial). Like many static deterministic approximation methods, it can be difficult to see precisely how to make the approximation 'more exact', or to fully examine the conditions under which the approximation error ought to be negligible (though there is decent practical experience and intuition in many cases). One can make reasonable inferences about whether each approximation is likely to veer on the side of over- or under-estimating posterior uncertainty, which tends to be a good exercise as a reader anyways. It's an exciting challenge to integrate these new approximations with existing approaches, compare them to old approaches, and so on. I am a generalist at heart when it comes to algorithms, and so when you see something which seems new, it's very exciting to try to pick apart which new principles are at play (or which classic principles have been recycled). ### Bayesian Computing without Exact Likelihoods. – Some Generalised Bayes approaches to clustering, some ABC-type approaches to clustering with intractable mixture components, and a Generalised Bayesian treatment of fitting quantum circuits to data. It is always refreshing to see the difficulty of problems which people encounter in different applications, and the approximations which they have to devise in trying to get something off the ground. Also some nice work on the a priori calibration of approximate inference methods; it would be interesting to see how they could be adapted to the a posteriori (i.e. data-adaptive) setting. – Also some fascinating work on Bayesian Semiparametrics, wherein one might be estimating a certain infinite-dimensional parameter (e.g. a density) well, but estimating functionals of that parameter (e.g. integrals against that density) poorly. A scary story for the Bayesian, I think - it is tempting to believe that 'if the posterior is working, then it is working', and this type of example dispels that myth to some extent. The talk also presented a 'targeted' fix, involving influence function-based two-step estimators. This can definitely be viewed as a success story from the perspective of mathematical statistics, if perhaps not from the perspective of 'full-stack Bayesian inference'. I suppose it's up to you whether this is satisfying. ### Missed Talks: 'Component-wise iterative ensemble Kalman inversion for static Bayesian models with unknown measurement error covariance', 'Sparse Hamiltonian Flows (Bayesian Coresets Without all the Fuss)', 'Bayesian Predictive Inference', 'Likelihood-Free Inference with Generative Neural Networks via Scoring Rule Minimization', 'Efficient Bayesian inference when likelihood computation includes numerical algorithms with adjustable error tolerances'. ## Wednesday (Main Conference, Day 1/3) ### Keynote > 'An Automatic Finite-Sample Robustness Check for Bayes and Beyond: Can Dropping a Little Data Change Conclusions?' Main focus is examining the sensitivity of statistical decision-making to data subtraction. The proposed method is practical, and the story of the whole thing is very compelling, but at some level it is also tricky to parse; the exact takeaways are a bit challenging to make sense of (e.g. for well-specified models and regular data, decisions can still be (correctly) diagnosed as unstable - so what is 'wrong' with the statistical analysis?). Still, the message is quite affecting; it is hard to imagine seeing the results and not caring at all. ### Invited Sessions • Inference in State-Space Models by synthetic Gaussian pinning. Given your approximate smoothing distribution, simulate synthetic data so that conditionally on that data, the SSM is much closer to linear and Gaussian. This means that we take the problem of inference in SSMs, and reduce it to the problem of approximate inference in a sequence of approximately linear-Gaussian SSMs, which opens many interesting doors. There are connections to earlier constructions which apply to generic MCMC problems (e.g. a favourite work of mine by Titsias-Papaspiliopoulos), as well as to modern tools for inference in SSMs which enable parallel-in-time computational strategies (e.g. many recent works of Särkkä and his group). • Efficient Backward Sampling strategies for Particle Filters / Sequential Monte Carlo. It is well-known that the 'exact' approach to this problem has cost which scales like $N^{2}$, and a popular rejection sampling variant has cost scaling more like $N\cdot\tau$, where $\tau$ is some characteristic 'waiting time' associated to the rejection sampling process. There are some genuine caveats associated with the rejection sampling approach, which make some past analyses seem not-entirely-honest. This talk advocated for an MCMC approach to backward sampling, wherein you trace the genealogy of the particle to initialise, and then do a few steps of Independent Metropolis-Hastings (IMH) on the ancestor indices to 're-wire' the particle system. The algorithm is pretty transparent and practical, and it seems like a good default choice. There are interesting theoretical questions remaining about what can be said about this MCMC approach at different levels of generality, some of which are connected to obtaining a more refined understanding of the IMH in the case of unbounded weights. One can also ask whether, for structured models, one can do better than pure IMH for the rewiring step; my guess is that without further structure, it is probably optimal. For low-dimensional models with local dynamics, it seems likely that if one is prepared to pay some extra log factors, one could construct some relevant data structures which would allow for improving upon the IMH. • Conditional SMC in High Dimension. Usual particle filter approaches degenerate with $D$, so some level of spatial localisation strategy is normally needed; c.f. the use of local proposals in high-dimensional MCMC, rather than importance sampling, rejection sampling, etc. By employing a suitable perturbation-selection strategy (which is not quite SMC), the authors present a conditional SMC-inspired algorithm which has stable behaviour as both $\left( D, T \right) \to \infty$. Of course, the locality of the perturbation means that one pays a certain price with respect to mixing time, but it is only (gently) polynomial in the dimension, rather than exponential; you replace exponentially-large weights and big moves by stable weights and polynomially-small moves. For the problem of high-dimensional filtering (for me, one of the archetypal 'hard' problems in computational inference), this seems to be an appropriate price to pay. • Stratification Methods for Integration of Highly-Smooth Functions. Connections to “approximate-then-integrate strategies”; a result of Bakhvalov suggests that such strategies are rate-optimal. This work presents practically simpler methods (in which no explicit approximation needs to take place) and achieves similar rate-optimality results. There was some interesting discussion on exactly how much it would help to have access to a higher-order oracle with information about the integrand (e.g. gradient information), and how this might impact rates. One can actually already ask the same question at the approximation-theoretic level. • Monte Carlo Twisting of Particle Filters. Usual twisting methods typically rely on parametric conjugacy, which is inherently quite limiting. Working with non-conjugate twisting functions tends to necessitate e.g. rejection sampling for implementation; on its face this is wasteful, but from the perspective of memory consumption, it can be useful. Actually, there is an interesting discussion to be had more generally about in which practical settings rejection sampling should actually be preferred to importance sampling, since rejection sampling usually throws away some information. In any case, it seems to be quite open how to twist models well when conjugacy assumptions are not available; interested to see more on this topic. • Fixed-Lag Online Particle Smoothing. I was convinced well by the talk that this is an interesting problem. As with many state-space model questions, the problem itself is quite elementary, the motivation is transparent enough, but still you can easily slip into talking about very difficult problems. The question of selecting the lag parameter is always a bit interesting to me; in some ways, it is easy to come up with a sensible ansatz (e.g. take enough particles to get a stable particle filter, increase the lag until you can't afford the computations any more), but it seems hard to make the right choice in a way that is fully satisfying theoretically as well. The presentation did a good job in emphasising that the key ingredients in computation for state-space models are the judicious use of conditional independence structures and Bayes' rule, which was very much the case for the presented algorithm. • Stein Transport for Controlled Sampling; arguably quite a lot like 'Simulated Annealing meets Stein Variational Gradient Descent', with some kernel ridge regression sprinkled on top. There are clear links to Transport Maps, Normalising Flows, Annealed Flow Transport, etc. There were informative hints to a familiar story here: for highly-multimodal problems (or other instances in which neighbouring measures are highly singular with respect to one another), transport maps need to have a large Lipschitz constant somewhere in the space; this tends to map onto computational instabilities. One wonders whether additional stochasticity might grease the wheels here (c.f. diffusion models v.s. normalising flows). I am always a big fan of SMC samplers, anyways, so naturally I was thinking about integrating the methodology in this way. ### Missed Sessions: – Stein Discrepancies, Measuring Quality of MCMC Samples, What lies beneath - Some recent advances in Bayesian nonparametrics – Robust innovations in gradient-based MCMC, Robustness to model misspecification ## Thursday (Main Conference, Day 2/3) ### Keynote > 'How many steps are needed for random walk Metropolis? Explicit convergence bounds for Metropolis Markov chains'. I worked on this project; it suffices to say that I am very proud of the work, and was thrilled to see it presented to such a large (and apparently receptive) audience. ### Invited Sessions • Analysis of Gibbs Samplers for Hierarchical Models in the Data-Rich Limit. Reduces the analysis of a high-dimensional Gibbs sampler to the analysis of a Markov chain of fixed dimension, by an intuitive sufficiency argument; shows that the large-N limiting algorithm is informative about the finite-N sampler. I am keenly interested to see the details of this argument when the paper is released, and to get a better handle on the exact implications in the finite-N setting. • Divide-and-Conquer Sequential Monte Carlo for spatiotemporal filtering and smoothing. Again, for me, this is one of the eternal grand challenges for computational inference. The approach is natural in many regards (c.f. domain decomposition strategies in the numerical solution of partial differential equations), though still does not completely solve the problem, even relative to some earlier proposed algorithms. Perhaps I would benefit from going back to the drawing board and see what lessons have been learned so far with this probem. Still, there are many appealing features of the Divide-and-Conquer approach. Actually, even applying this methodology to static problems (e.g. inference in Markov Random Fields with no temporal aspect) deserves further exploration; it has certainly been done a little bit, but there are dimensionality-related questions which I don't fully understand just yet. ### Panels General / Grand Challenges for Bayesian Computation, also Probabilistic Programming. Neither were too spicy in terms of content, but were both still quite interesting. It is good to be reminded that there is no dearth of difficult statistical problems which need good practical solutions. I am also periodically reminded that for many applied science problems, the scientists are perpetually so far away from using the biggest model which they can think of; there is always more structure to impart, etc. It is quite inspirational, really. ### Invited Sessions • Some varied works on ABC with alternative loss functions (MMD, Energy Distance, Scoring Rules, etc.). More on model misspecification, robustness, and other second-order statistical concerns; I find it is very healthy to see this becoming routine within the community. There is a mix of computational motivation and 'purely statistical' motivation, which is again healthy from my point of view. The solutions which I saw involve both imputation-optimisation strategies (following work of e.g. Chris Holmes) but also MCMC-type strategies (perhaps with a fresh stochastic gradient flavour, again an area which is quite fun and not totally resolved theoretically). Still I witness a general focus on differentiable forward models (i.e. models which are friendly to the reparametrisation trick) which sadly omits many ABC-type problems of practical relevance, but the broader methodology remains interesting independently of these choices. ### Missed Sessions: – Trust and adding new algorithms to probabilistic programming frameworks – Piecewise deterministic Monte Carlo: Recent Advances, MCMC for Multi-Modal Distributions ## Friday (Main Conference, Day 3/3) ### Keynote > 'Adversarial Bayesian Simulation' Generative approaches to posterior inference in simulator-specified models. This means that e.g. the posterior is specified through a data-conditional generative model (trained adversarially) or through a variational inference strategy. There were some interesting theoretical contributions here for some methodologies which have existed (in some form, at least) for some time (e.g. Chaudhuri-Nott-Pham's ABC by Classification, Tran-Ranganath-Blei's Implicit Variational Models). For me, it would be interesting to see how adversely these approaches are impacted by concerns of model misspecification. ### Invited Sessions • Solidarity of Gibbs Samplers. Gibbs samplers can be implemented in many forms; can the implementation affect the ergodicity properties dramatically? This talk presented a number of positive results, saying that if certain scans give rise to geometrically-ergodic chains, then all valid scans do, which is nice. Some open problems remain; I am curious about whether the gaps are surmountable. • Iterated Sampling-Importance Resampling for Bias Reduction in Self-Normalised Importance Sampling. Nice connections to Conditional SMC, Particle Gibbs, etc., which are to be fleshed out in an upcoming paper. The analysis is simple and clean; I am keen to understand it so that it is clearer to me what the best finite-sample strategy ought to be. Because SNIS is such a 'static' algorithm in its character, I wonder whether a simpler, classical analysis might be better-suited to bias reduction, e.g. the work of Firth, Kosmidis and others in bias reduction of statistical estimators. Still, perhaps the non-asymptotic framing plays a bigger role, or the extension to dependent SMC-type estimators is sufficiently challenging. Separately, this work invigorates me to think more deeply about these elementary, particle-based MCMC algorithms. • Constrained HMC for Sampling from the Posterior of SDE Models. For me, this is another culmination of Matt Graham's excellent earlier work with Amos Storkey, which applied to ABC-type simulator models (in other works, it was also ably applied to simulation from posteriors arising in Bayesian inverse problems). Here, the Markov structure of the model is used crucially to control the computational cost of the algorithm. There are various interesting extensions to consider, e.g. i) how to stabilise the performance of the algorithm under mesh-refinement (following e.g. pCN, pCN-Langevin, $\infty$-MALA, $\infty$-HMC), ii) combining this approach with Multilevel Monte Carlo to remove the inferential bias of discretisation, iii) accommodation of non-Brownian noise (either coloured Gaussian noise, or entirely non-Gaussian noise), iv) how to optimally modify the algorithm to handle nearly-linear transitions and observations gracefully. A very nice story all around, really expertly done. • Preconditioning in MCMC. Complexity analysis in recent years has highlighted the key role which the condition number $\kappa$ plays in dictating the convergence behaviour of various practical MCMC algorithms. Preconditioning is widely-applied, and clearly helps ... but can it be done? In contrast to e.g. numerical analysis of PDE, it is rare for computational statisticians to prove that for a given problem class, preconditioning can be implemented and provably improve the complexity of the problem. Very nice to see this question being taken seriously by our community and eager to see more work done on it. • Large Deviations of Metropolis-Hastings Markov Chains. Largely a new area for me; very interesting in view of the possibility of comparing the efficiency of different algorithms (and perhaps those which are not as easy to compare in the Dirichlet form framework, e.g. non-reversible chains), which I care about in some depth. Looking forward to the preprint. ### Missed Sessions: – Piecewise deterministic Monte Carlo, Approximate Bayesian Computation, Normalising Flows to Enhance Bayesian Sampling – Machine Learning meets Adaptive MCMC, J-ISBA advances in scalable Bayesian methods, Recent Advances in Variational Inference – (most!) Contributed Sessions ## Epilogue Some other discussions which arose informally during the week, and which gave rise to some varied responses: • What are the right reasons to pursue a Bayesian analysis of a statistical problem? • To what extent should a Bayesian statistician 'report to' mathematical statisticians? Would you use an inconsistent estimator? Or perhaps just an estimator which is not minimax-optimal? • Which is simpler out of SMC and MCMC? If you know only one of the two, should you make an effort to learn the other as well? • How should we, as a community, allocate effort between domain-specific computational methods and general-purpose computation methods? • Should BayesComp have a lower-case B (i.e. shift focus more towards algorithms), or a lower-case C (i.e. shift focus towards specific Bayesian problems), or continue as-is? • Why can't people just specify their model correctly? After this conference, I am thinking carefully about how my impressions should guide my future research. I would not say that trends have shifted dramatically in terms of what is being researched, but there are probably some subtle trends in what people are caring about as an audience member; what they would like to see tested, probed; what examples are interesting, etc. I predict that the impacts will be subtle but real; hard to say exactly how right now.