# Generalised Variational Inference Speaker: Willem Date: 8 May 2019 Paper: Generalized Variational Inference (https://arxiv.org/pdf/1904.02063.pdf) 1. Some axioms i) $E_q[\ell(\theta,x)]$: Measure of fit, $\theta$ is parameter, $x$ is data. ii) $D(q \parallel \pi)$: Uncertainty quantification, $D$ is (any) divergence, $\pi$ is prior. iii) Optimize over $q(\theta)\in\Pi$, a set of distribution over $\pi$. 2. (something that is not used so not mentioned) Theorem 2 (Generalised posterior) From 1., it follows (supposedly trivially) that $$P(\ell,D,\Pi) = \underset{q \in \Pi}{\text{argmin}}\, f(E_q[\ell(\theta,x)], D(q \parallel \pi)).$$ 3. Penalize $D(q \parallel \pi)$: Prior regularization 4. $\ell' = \ell + c \Longrightarrow P(\ell',D,\Pi) = P(\ell,D,\Pi)$ Theorem 3 (Generalised variational inference) If $f(a,b) = a \circ b$, where $\circ$ is either addition/subtraction/multiplication/division, then it follows from 1., 3. and 4. that $\circ$ has to be addition, i.e. $$P(\ell,D,\Pi) = \underset{q \in \Pi}{\text{argmin}} \, E_q[\ell(\theta,x)] + D(q \parallel \pi)$$ where $P$ is a generalised posterior. ## Variational Inference Variational inference is optimal in the definition of theorem 3. (Optimality is not defined...) KL-divergence VI: $q(\theta) \propto \pi(\theta) \, \text{exp}(-\ell(\theta,x))$. (KL is Rényi with $\alpha=1$) f-divergence VI: $\underset{q \in \Pi}{\text{argmin}} \, F(q \parallel \pi(\theta|x))$.