### KL divergence bound
Let $f$ be a convex function, and $f^*$ be its dual. For any vectors $x, y$, Fenchel's inequality says
$$f(x) + f^*(y) - x\cdot y \geq 0\ .$$
We can use this to get a lower bound on $f$:
$$f(x) \geq x\cdot y - f^*(y)$$
for any vector of dual variables $y$.
If we take $f(x)$ to be generalized negentropy,
$$\textstyle f(x) = \sum_i (x_i\ln x_i - x_i)$$
then the dual is
$$\textstyle f^*(y) = \sum_i e^{y_i}$$
and the Fenchel bound is a generalization of the fact that KL divergence is nonnegative. In this case $x$ is a probability distribution, and $y$ is a log-distribution (that is, $e^y$ is a distribution).
### Latent-variable model
We want to do inference for the latent variable model
$$\textstyle P(W) = \sum_Z P(W\mid Z) P(Z)$$
where $W$ is an observed variable and $Z$ is its latent cause. For this purpose, we want to find an approximation $Q(Z)$ to the true posterior of the latent $P(Z\mid W)$.
### ELBO
If we apply the Fenchel bound with $y = \ln P(Z\mid W)$ and $x = Q(Z)$, we get
$$\begin{align*}0 &\leq f(x) + f^*(y) - x\cdot y\\
&= \textstyle \sum_z Q(z)\ln Q(z) - 1 + 1 - \sum_z Q(z)\ln P(Z\mid W)
\end{align*}$$