# SBI benchmarks with BayesFlow ?
## preliminaries
To describe my problem: we have accelerator beamline simulations (2M samples and more incoming, simulation is comparatively cheap) which describe how the electron/photon beam should behave once the position/parameters of installed devices along the
beamline are fixed. These parameters are our theta, 12 float32 values per sample. The simulation provides beamline profiles just before the sample that is to be irradiated. This is our x (200 float32 values per sample). The idea of my project is to invert the forward simulation from theta to x. In practice, tuning this beamline is super cumbersome. So my academic partners are hoping to use conditional INNs to help tuning the beamline. Given an experimental x_o obtained from the beamline, which theta corresponds to this?
### Terms used
- `x` ... conditioning input, in our case a beamline profile (how good is the beam, aka where is it?)
- one sample of x has 200 float32 values inside for my data (min-max normalized to [0,1])
- `theta` ... variables to predict during inference (inverse mode of the cINN), in our case the parameters of the simulation
- one sample of theta has 12 float32 values inside for my data (min-max normalized to [0,1])
- `z` ... latent space or base distribution, [4] calls this `u`, in our case a normal distribution N(0,1)
- one sample of z has 12 float32 values
```
#forward (during training)
x
|
v
theta==(*)==> z
```
```
#backward for generating tumor images (during inference), i.e. estimating theta
z ... randomly sampled normal distribution
x
|
v
theta <==*== z
```
### present standing
Currently, I have a bayesflow [3] implementation in pytorch learning the posterior from this simulation I mentioned. The problem is, this posterior can only be sampled to my present knowledge! And to my knowledge, there is no guarantee that NFs provide unimodal symmetric posteriors - even by samples. So a simple torch.mean won't cut it to estimate the most probable posterior value of a fixed x_o.
I brought this up with Michael and he kindly alluded to the fact of mixture density networks used in SBI to conduct this
posterior density estimation [1]. I started to have a look at it and still have to understand inputs/outputs, inner works and
the conditions under which they are used.
To cut a long story short: I am still learning. But what appears reasonable to me, is to see if I can come up with a
posterior_nn wrapper around my trained bayesflow to compare it to the other flow based models SBI has to offer (made, maf and
nsf). Maybe even using sbibm for a downstream comparison?
[1] https://arxiv.org/abs/1711.01861 (suggested by Michael) and
[2] https://arxiv.org/abs/1905.07488 (I digged up and found informative)
[3] Radev et al, "BayesFlow", http://arxiv.org/abs/2003.06281
[4] Papamakarios et al, "Normalizing Flows for Probabilistic Modeling and Inference", http://arxiv.org/abs/1912.02762
[5] https://github.com/bayesiains/nflows
## Q: how to construct `log_prob` for my estimator?
- I understand that SBI uses `nflows` to train the density estimator, like maf, nsf, etc
- I am not sure I grasp how nflows implements the `log_prob` function for every normalizing flow
- I take from [4] and [3] that `log_prob` is the sum of probabilities over `z` as a result from one forward pass of the flow
- how do I construct the probilities from the sampled `z` ? https://github.com/bayesiains/nflows/blob/75048ff2ebd6b7ccad2fb8380630da08aa6ab86b/nflows/flows/base.py#L37-L41
- where can I see how nflows does that with `maf`, `nsf` and friends?
## Q: what is the conceptual spirit behind APT / SNPE-C ?
- I understand that `sbi` first trains a density estimator given the data
- during training how are `theta` and `x` consumed by `sbi`?
- in other words: how is the conditioning of the flow performed? https://github.com/bayesiains/nflows/blob/75048ff2ebd6b7ccad2fb8380630da08aa6ab86b/nflows/transforms/made.py#L276-L277
- after training the normalizing flow or mdn, the posterior is estimate iteratively
- what is the main motivation to do that? To prevent posterior estimates that fall outside the prior (leakage)?
## Q: how to do MAP estimation?
- from the PR, I understand that a trained network and a converged posterior are sampled
- the obtained sampled are fed into the log_prob of the estimator
- gradient ascent is used to find a most likely estimate
- how to obtain uncertainties/covariances around this most likely theta?
# Diagnostics
When you do single-round amortized inference using Bayesflow or NPE, NLE, NRE you can easily do simulation-based calibration: repeat the inference many (e.g., 1000) times with difference x_os simulated from the prior (you only have to train once, then plug in many different x_os and get the posteriors). The resulting samples from each of the 1000 posteriors have to be distributed according to the prior (!). More details in https://arxiv.org/abs/1804.06788.
This gives a necessary condition for accurate inference: if this checks fails, your inference is not valid. If it passes that's good, but you should make sure that your posterior is different from the prior.