### KL divergence bound Let $f$ be a convex function, and $f^*$ be its dual. For any vectors $x, y$, Fenchel's inequality says $$f(x) + f^*(y) - x\cdot y \geq 0\ .$$ We can use this to get a lower bound on $f$: $$f(x) \geq x\cdot y - f^*(y)$$ for any vector of dual variables $y$. If we take $f(x)$ to be generalized negentropy, $$\textstyle f(x) = \sum_i (x_i\ln x_i - x_i)$$ then the dual is $$\textstyle f^*(y) = \sum_i e^{y_i}$$ and the Fenchel bound is a generalization of the fact that KL divergence is nonnegative. In this case $x$ is a probability distribution, and $y$ is a log-distribution (that is, $e^y$ is a distribution). ### Latent-variable model We want to do inference for the latent variable model $$\textstyle P(W) = \sum_Z P(W\mid Z) P(Z)$$ where $W$ is an observed variable and $Z$ is its latent cause. For this purpose, we want to find an approximation $Q(Z)$ to the true posterior of the latent $P(Z\mid W)$. ### ELBO If we apply the Fenchel bound with $y = \ln P(Z\mid W)$ and $x = Q(Z)$, we get $$\begin{align*}0 &\leq f(x) + f^*(y) - x\cdot y\\ &= \textstyle \sum_z Q(z)\ln Q(z) - 1 + 1 - \sum_z Q(z)\ln P(Z\mid W) \end{align*}$$