Topics to discuss (11/May/2010)

# Topics to discuss (11/May/2010) Please do a new git clone of my LFI repo : https://github.com/plcrodrigues/LFI-Explorations ## Likelihood-free Inference **Experiments.** Did several experiments with SNPE methods on the Papamakarios 2016 example and also on the Lotka-Volterra dynamics. All implementations were done on the DELFI package, but I feel quite limited in the neural networks part. I did work a lot on the SBI code, specially to fix some issues related to GPU usage, but the SNPE-A method is not yet implemented there. **Impressions.** In the end, I have the impression that the SNPE-B method does not work that well (the one with importance sampling strategy to correction). The SNPE-A has indeed some cases where the procedure fails because of problems on the Bayesian covariance updating. Can we do something with this? Let's formalize the idea. First, remember the basic structure of the SNPE algorithm (extracted from page 69 in [here](https://arxiv.org/abs/1910.13233)): ![](https://i.imgur.com/f3DvCCB.png) The posterior $p(\boldsymbol{\theta}|\boldsymbol{x} = \boldsymbol{x}_0) = \sum_k \alpha^{\prime}_k \mathcal{N}(\boldsymbol{\theta}|\boldsymbol{m}^{\prime}_k, \boldsymbol{S}^{\prime}_k)$ is obtained from the correction of the MDN output $q_{\boldsymbol{\phi}}(\boldsymbol{\theta}|\boldsymbol{x}) = \sum_k \alpha_k \mathcal{N}(\boldsymbol{\theta}|\boldsymbol{m}_k, \boldsymbol{S}_k)$ in terms of the proposal prior $\tilde{p}(\boldsymbol{\theta}) = \mathcal{N}(\boldsymbol{\theta}|\boldsymbol{m}_0, \boldsymbol{S}_0)$ as in ![](https://i.imgur.com/cPX9iIB.png) It is precisely at this moment of update that the SNPE-A algorithm fails sometimes. It is usually because the precision of the MDN output is worse than the precision of the proposal prior, which means that the MDN didn't really improve things. The algorithm could either have a if/else test for avoid breaking or the update could be a bit different. > At round $r$ we have the proposal prior $\tilde{p}^{(r)}(\boldsymbol{\theta})$ and want to deform it with the help of the MDN output $q^{(r)}_{\boldsymbol{\phi}}(\boldsymbol{\theta}|\boldsymbol{x})$ to obtain the posterior ${p}^{(r)}(\boldsymbol{\theta}|\boldsymbol{x} = \boldsymbol{x}_0)$. This posterior will serve later on as the new proposal prior for round $r+1$. We could interpret these transformations as a path in the statistical manifold, where the Bayesian updates described tell how to move. Since everything is Gaussian, these paths should respect the intrinsic geometry of the SPD manifold. More generally, in all SNPE methods there is an update on the posterior every round that could maybe be regularized/improved. >> Re-write the update each round as the steps of an optimization procedure. **Automatic differentiation.** Can we do something with autodiff ? Like taking the derivatives of the black-box simulator and use this to provide more information for the LFI methods ? They don't seem to use this information right now. ## Scientific AI **Neural ODE.** David Duvenaud's work on *"Neural ODE"* (2018) [[arxiv]](https://arxiv.org/abs/1806.07366). It is basically the extension of the ResNet framework to a case where the layers become continuous. Most examples that I've seen for now are proof-of-concept and aimed to classification problems. The paper *"Latent ODEs for irregularly-sampled time series"* [[arxiv]](https://arxiv.org/abs/1907.03907) from the same group looks promising and relevant for us. They have the Python package `torchdiffeq` for showcasing their work and it is rather well written, but not really documented. Lacking tutorials, etc. **Extensions**. The *"Augmented Neural ODE"* (2019) [[arxiv]](https://arxiv.org/abs/1904.01681) from people at Oxford is a nice improvement to the basic idea. The `torchdyn` package looks very promising but I have the impression that the party has been going one for much longer in the Julia community (more code, more tutorials, more contributions). **Julia**. I've read/watched a lot of stuff from this guy from the Julia community: Chris Rackauckas. He is very much involved with these ideas of Scientific AI. Some interesting references: - Rackauckas et al. *"Universal Differential Equations for Scientific Machine Learning"* (2020) [[arxiv]](https://arxiv.org/abs/2001.04385) - Extension of what is done at Neural ODEs to problems more related to what we want to do - Two courses on the subject at MIT - https://github.com/mitmath/18337 - https://github.com/mitmath/18S096SciML ## Deep Learning with EEG - Schirrmeister et al. *"Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG"* (2018) [[arxiv]](https://arxiv.org/pdf/1703.05051.pdf) - Chambon et al. *"A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series"* (2017) [[arxiv]](https://arxiv.org/abs/1707.03321) - Banville et al. *"Self-Supervised Representation Learning from Electroencephalography Signals"* (2019) [[arxiv]](https://arxiv.org/pdf/1911.05419.pdf) **End-to-end.** How does it work exactly this end-to-end learning with EEG signals ? How did they implement for the Banville paper ? The feature extractor comes from Chambon's work and is trained on what ? It seems that they do things in two steps, but I did not quite get it. "Once h is trained, we project the labeled samples into the networks’ respective fea- ture space and then train multinomial linear logistic regression models on each set of features to predict sleep stages. It means that you have a feature extractor followed by another one ? What about RNN architectures ? In the paper that presents SNPE-C, *"Automatic Posterior Transformation for LFI"* (2019) [[arxiv]](https://arxiv.org/abs/1905.07488), the authors propose a RNN-APT architecture for learning the summary statistics of the time series. They added an initial layer with 100 GRU units to a MDN with a single Gaussian component. **Readings**. I should probably take a closer look at Hyvarinen's work *"Nonlinear ICA of temporally dependent stationary sources"* (2017) [[jmlr]](http://proceedings.mlr.press/v54/hyvarinen17a.html) but haven't had the time (i.e. didn't put it on my reading list yet)