Factorization of Evidence Lower Bound

# Factorization of Evidence Lower Bound ###### tags: `math` `machine-learning` ## Derivation of Evidence Lower Bound By Baye's rule $$p(X)=\frac{p(X, Z)}{p(Z|X)}\frac{q(Z)}{q(Z)}=\frac{p(X,Z)}{q(Z)}\frac{q(Z)}{p(Z|X)}$$ thus, $$\ln p(X)=\ln \frac{p(X,Z)}{q(Z)} + \ln \frac{q(Z)}{p(Z|X)}$$ From EM algorithm, we can decompose the log marginal probability using $$\ln p(X)=\mathcal{L}(q) + KL(q\|p)$$ where - $\mathcal{L}(q)=\int q(Z)\ln\frac{p(X,Z)}{q(Z)}\mathrm{d} Z=E_q\{\ln\frac{p(X,Z)}{q(Z)}\}$ - $KL(q\|p)=\int q(Z)\ln\frac{q(Z)}{p(Z|X)}\mathrm{d} Z=E_q\{\ln\frac{q(Z)}{p(Z|X)}\}$ The objective of Variantional Inference (VI) is to maximize the Evidence Lower Bound (i.e. $\mathcal{L}(q)$) with respect to $q$, which is equivalent to minimize the KL-divergence because $\ln(X)$ shall be constant regardless of any $q$. ## The mean field method By Mean-field Assumption, $q$ can be factorized into independent distributions $$q(Z)=\prod_i q_i(Z_i) = \prod_i q_i$$ thus, we are able to plug this form of factorization into Evidence Lower Bound (i.e. $\mathcal{L}$) $$\mathcal{L}(q)=\int (\prod_i q_i)(\ln p(X, Z)-\sum_i \ln q_i)\mathrm{d} Z$$ $$=\int (\prod_i q_i)\ln p(X, Z)\mathrm{d} Z - \int (\prod_i q_i)(\sum_i \ln q_i) \mathrm{d}Z$$ Now we select one index $j \in \{i|i=1,2,\ldots,M\}$, write $\mathcal{L}(q)$ as integral of a function of $Z_j$ $$\mathcal{L}(q) = \int q_j \{\int (\prod_{i\neq j} q_i)\ln(X, Z)\mathrm{d} Z_i\}\mathrm{d}Z_j - \int q_j(\prod_{i\neq j}q_i)(\ln q_j+\sum_{i\neq j}\ln q_i)\mathrm{d} Z \\ = \int q_j \{\int (\prod_{i\neq j} q_i)\ln(X, Z)\mathrm{d} Z_{i\neq j}\}\mathrm{d}Z_j - \int q_j(\prod_{i\neq j}q_i)(\ln q_j)\mathrm{d} Z - \int q_j(\prod_{i\neq j}q_i)(\sum_{i\neq j}\ln q_i)\mathrm{d} Z \\ = \int q_j \{\mathbb{E}_{i\neq j} \ln p(X, Z)\}\mathrm{d}Z_j - \int q_j\ln q_j \mathrm{d} Z_j - \text{const w.r.t.}_j \\ = \int q_j \ln \frac{\exp{\mathbb{E}_{i\neq j} \ln p(X, Z)}}{q_j} \mathrm{d} Z_j - \text{const w.r.t.}_j$$ Let's define a constant $$M=\int \exp{\mathbb{E}_{i\neq j} \ln p(X, Z)}\mathrm{d}Z_j$$ we can write above equation as $$\mathcal{L}(q) = \int q_j \ln \frac{\frac{1}{M}\exp{\mathbb{E}_{i\neq j} \ln p(X, Z)}}{q_j}\mathrm{d}Z_j + \ln M - \text{const w.r.t.}_j \\ = \int q_j \ln \frac{\frac{1}{M}\exp{\mathbb{E}_{i\neq j} \ln p(X, Z)}}{q_j}\mathrm{d}Z_j + \text{const w.r.t.}_j \\ = \int q_j\ln\frac{q^*_{j|i\neq j}}{q_j} + \text{const w.r.t.}_j \\= -KL(q_j \|q^*_{j|i\neq j}) + \text{const w.r.t.}_j$$ where $q^*_{j|i\neq j}$ denotes the distribution $q_j$ conditioned that other distribution $q_{i\neq j}$ is fixed. It's more convenient to write $$\ln q^*_{j|i\neq j} = \mathbb{E}_{i\neq j} \ln p(X,Z) + \text{const}$$ ## Conclusion From above derivation, we obtain that $\mathcal{L}(q)$ and $-KL(q_j \|q^*_{j|i\neq j})$ has maximum when $q_j$ is exactly same as $q^*_{j|i\neq j}$. It tells us an important truth: In the setting of Variational Inference, **we shall update each individual approximate distribution by computing the expectation of other fixed approximate distributions.**