Without aux variables, we have variational noising dist $q(xt|x0)$. To define the generative model at $p(x0|xt)$ people often just take the reverse condiional under $q$ ie $q(x0|xt)$ and set $p(x0|xt)$'s mean and cov to $q(x0|xt)$'s mean and cov. With aux variables, we have $q$ dist specified by $q(x0,v0,u_\epsilon=[x_\epsilon,v_\epsilon])$ by $q(x0)q(v0)q(u_\epsilon|u0)$. We now want to set $p_\theta(x0|u_\epsilon)$'s mean and cov to those of $q(x0|u_\epsilon)$ (rather than $q(u0|u_\epsilon))$ $\log p (x) \geq E_{q(u_\epsilon|x)}[\log p(x|u_\epsilon) + \log p(u_\epsilon) - \log q(u_\epsilon|x)]$ $\geq E_{q(u_\epsilon|x)}[\log p(x|u_\epsilon) + \log \text{p lower bound}(u_\epsilon) - \log q(u_\epsilon|x)]$ -- $\log p (x) \geq E_{q(x)q(v0)q(u_\epsilon|u_0)}[\Big(\log p(u_0|u_\epsilon) - \log q(u_\epsilon|u_0)\Big) + \log p(u_\epsilon)]$ --- $$E_{q(u_T|x0,v0)}[\log \pi(u_T)] + ISM/DSM - E_{qv0}[\log q(v0)] $$