# Generalised Variational Inference
Speaker: Willem
Date: 8 May 2019
Paper: Generalized Variational Inference (https://arxiv.org/pdf/1904.02063.pdf)
1. Some axioms
i) $E_q[\ell(\theta,x)]$: Measure of fit, $\theta$ is parameter, $x$ is data.
ii) $D(q \parallel \pi)$: Uncertainty quantification, $D$ is (any) divergence, $\pi$ is prior.
iii) Optimize over $q(\theta)\in\Pi$, a set of distribution over $\pi$.
2. (something that is not used so not mentioned)
Theorem 2 (Generalised posterior)
From 1., it follows (supposedly trivially) that
$$P(\ell,D,\Pi) = \underset{q \in \Pi}{\text{argmin}}\, f(E_q[\ell(\theta,x)], D(q \parallel \pi)).$$
3. Penalize $D(q \parallel \pi)$: Prior regularization
4. $\ell' = \ell + c \Longrightarrow P(\ell',D,\Pi) = P(\ell,D,\Pi)$
Theorem 3 (Generalised variational inference)
If $f(a,b) = a \circ b$, where $\circ$ is either addition/subtraction/multiplication/division, then it follows from 1., 3. and 4. that $\circ$ has to be addition, i.e. $$P(\ell,D,\Pi) = \underset{q \in \Pi}{\text{argmin}} \, E_q[\ell(\theta,x)] + D(q \parallel \pi)$$ where $P$ is a generalised posterior.
## Variational Inference
Variational inference is optimal in the definition of theorem 3. (Optimality is not defined...)
KL-divergence VI: $q(\theta) \propto \pi(\theta) \, \text{exp}(-\ell(\theta,x))$. (KL is Rényi with $\alpha=1$)
f-divergence VI: $\underset{q \in \Pi}{\text{argmin}} \, F(q \parallel \pi(\theta|x))$.