Constrained Hamiltonian dynamic in parameter space

# Constrained Hamiltonian dynamic in parameter space $\newcommand{\reals}{\mathbb{R}} \newcommand{\idmtx}{\mathbb{I}} \newcommand{\nrm}{\mathcal{N}} \newcommand{\d}{\mathop{}\!\mathrm{d}} \newcommand{\jacob}{\mathop{}\!\partial} \newcommand{\grad}{\mathop{}\!\nabla} \newcommand{\tr}{^{\mkern-1.5mu\mathsf{T}}}$ ## Generative model Assume latent state $\theta \in \reals^P$ with Gaussian prior $\nrm(0,C)$, observation $y \in \reals^N$ and observation function $g: \reals^P \to \reals^N$ subject to additive Gaussian noise with covariance $R = SS\tr \succ 0$. This corresponds to a generative model $$ y = g(\theta) + S \eta, \quad \theta \sim \nrm(0, C), \quad \eta \sim \nrm(0, \idmtx_N).$$ ## Constrained Hamiltonian dynamic Denote the overall unknown state $q = [\theta; \eta]$ and define a constraint function $c : \mathbb{R}^{P+N} \to \mathbb{R}^N$ which is zero at states consistent with the observations $$c([\theta; \eta]) = g(\theta) + S \eta - y.$$ The posterior distribution has support on an implicitly defined manifold $\mathcal{M} = \lbrace q \in \reals^{P+N} : c(q) = 0\rbrace$ with unnormalised density $\pi_{\mathcal{M}}$ on $\mathcal{M}$ with respect to $P$-dimensional Hausdorff measure on ambient space $\reals^{P+N}$ equipped with the Euclidean metric $d(q, q') = q\tr q$ with $\pi_{\mathcal{M}}$ defined by $$\log\pi_{\mathcal{M}}(q) = -\frac{1}{2}q\tr \begin{bmatrix} C^{-1} & 0 \\ 0 & \idmtx_N \end{bmatrix} q - \frac{1}{2}\log\det(\jacob c(q) \jacob c(q)\tr).$$ Augmenting the space with a momentum variable $p\in\reals^{P+N}$ with covariance $\idmtx_{P+N}$, the Hamiltonian for the constrained system is $$ h(q, p) = -\log\pi_{\mathcal{M}}(q) + \frac{1}{2}p\tr p + c(q)\tr \lambda(p,q), $$ with $\lambda$ implictly defined by the requirement $c(q) = 0$. The corresponding constrained Hamiltonian dynamic is defined by the system of differential algebraic equations \begin{align} \frac{\d q}{\d t} &= \jacob_2 h(q, p) = p, \\ \frac{\d p}{\d t} &= -\jacob_1 h(q, p) = \grad\log\pi_{\mathcal{M}}(q) - \jacob c(q)\tr \lambda(p,q),\\ c(q)&= 0. \end{align} ## Riemannian manifold in parameter space The manifold $\mathcal{M}$ can also be defined via an explicit parameterisation $$\phi(\theta) = \left[\theta; S^{-1}(y-g(\theta)) \right] \implies \mathcal{M} = \phi(\reals^P).$$ Following Leimkuhler and Reich (2004) §7.6.2, if we consider the transformation $q = \phi(\theta)$ then the canonical momentum $\nu$ associated with $\theta$ is defined by $\nu = \jacob\phi(\theta)\tr p$ and the Hamiltonian $h'$ in the new variables $(\theta, \nu)$ is \begin{align}\require{cancel} h'(\theta,\nu) &= h(\phi(\theta), (\jacob\phi(\theta)\tr)^+\nu)\\ &= \frac{1}{2}\phi(\theta)\tr\begin{bmatrix} C^{-1} & 0 \\ 0 & \idmtx_N \end{bmatrix}\phi(\theta) + \frac{1}{2}\log\det(\jacob c(\phi(\theta))\, \jacob c(\phi(\theta))\tr) + \,\\ & \phantom{=~} \frac{1}{2}\nu\tr (\jacob\phi(\theta))^+(\jacob\phi(\theta)\tr)^+ \nu + c(\phi(\theta))\tr\lambda(\phi(\theta), (\jacob\phi(\theta)\tr)^+\nu). \end{align} By construction $c(\phi(\theta)) = 0 ~\forall \theta \in \reals^P$ so the last term can be ignored. By the properties of the pseudo-inverse we have that $$ (\jacob\phi(\theta))^+ (\jacob\phi(\theta)\tr)^+ = (\jacob\phi(\theta)\tr \jacob\phi(\theta))^+ = (\jacob\phi(\theta)\tr \jacob\phi(\theta))^{-1}. $$ The last equality arises as from the definition of $\phi$, the Jacobian $\jacob\phi$ has the block structure $$ \jacob\phi(\theta) = \begin{bmatrix}\idmtx_P \\ -S^{-1} \jacob g(\theta)\end{bmatrix}, $$ and so $\jacob\phi(\theta)$ has full-column rank for all $\theta \in \reals^P$. We therefore have that $$ (\jacob\phi(\theta))^+ (\jacob\phi(\theta)\tr)^+ = (\idmtx_P + \jacob g(\theta)\tr S^{-T} S^{-1} \jacob g(\theta))^{-1}. $$ From the definition of the constraint function $c$, the Jacobian $\jacob c$ has a block structure $$\jacob c(\phi(\theta)) = \begin{bmatrix} \jacob g(\theta) & S \end{bmatrix}.$$ Therefore we have that $$ \det(\jacob c(\phi(\theta))\, \jacob c(\phi(\theta))\tr) = \det(\jacob g(\theta)\jacob g(\theta)\tr + SS^{T} ). $$ Applying the matrix determinant lemma gives $$ \det(\jacob c(\phi(\theta))\, \jacob c(\phi(\theta))\tr) = \det(SS\tr) \det(\idmtx_P + \jacob g(\theta)\tr S^{-T} S^{-1} \jacob g(\theta)). $$ Putting everything together we therefore have that \begin{align} h'(\theta,\nu) &= \frac{1}{2}\theta\tr C^{-1}\theta + \frac{1}{2\sigma^2}(y-g(\theta))\tr R^{-1} (y-g(\theta)) + \frac{1}{2} \log\det(R) + \,\\ & \phantom{=~} \frac{1}{2}\nu\tr (\idmtx_P + \jacob g(\theta)\tr R^{-1} \jacob g(\theta))^{-1} \nu + \frac{1}{2}\log\det(\idmtx_P + \jacob g(\theta)\tr R^{-1} \jacob g(\theta)). \end{align} This corresponds to a Riemannian-manifold Hamiltonian $$ h'(\theta, \nu) = -\log\pi(\theta) + \frac{1}{2}\nu\tr M(\theta)^{-1} \nu + \frac{1}{2}\log\det(M(\theta)) $$ for a target (posterior) distribution in the parameter space with unnormalised density $$ \pi(\theta) = \frac{1}{\sqrt{\det(R)}} \exp\left(-\frac{1}{2}(y-g(\theta))\tr R^{-1} (y-g(\theta))\right) \exp\left(-\frac{1}{2}\theta\tr C^{-1}\theta\right) $$ and metric (momentum covariance) $$ M(\theta) = \idmtx_P + \jacob g(\theta)\tr R^{-1} \jacob g(\theta). $$ ## Generalised No-U-Turn criteria in parameter space A heuristic 'No-U-Turn' criterion which terminates the expansion of a Hamiltonian trajectory when it begins to 'go back on itself', was proposed for Hamiltonian systems defined on Riemannian manifolds by Betancourt (2013). For a system with metric $M(\theta)$, if the position and momentum pairs along a trajectory of $I$ integrator steps are denoted $\theta_{1:I}$ and $\nu_{1:I}$ respectively, then the integration is terminated when the following condition is first satisfied $$ \nu_I\tr M^{-1}(\theta_I) \sum_{i=1}^I \nu_i < 0. $$ This condition is expressed in terms of the position and conjugate momenta of a Hamiltonian trajectory in the *parameter space*. Using the relations $\nu_i = \jacob\phi(\theta)\tr p_i$ and $q_i = \phi(\theta_i)$ and the definition of $M(\theta)$ in the previous section, we can derive an equivalent condition in terms of the positions $q_{1:I}$ and momenta $p_{1:I}$ of a Hamiltonian trajectory on the manifold embedded in the extended space. We first decompose each $P+N$ dimensional momentum vector $p_i$ as $p_i = [p^\theta_i; p^\eta_i]$ with $p^\theta_i$ a $P$ dimensional subvector and $p^\eta_i$ a $N$ dimensional subvector. Then we have that $$ \nu_i = \jacob\phi(\theta_i)\tr p_i = \begin{bmatrix}\idmtx_P & -\jacob g(\theta)\tr S^{-1}\end{bmatrix} \begin{bmatrix}p^\theta_i \\ p^\eta_i\end{bmatrix} = p^\theta_i - \jacob g(\theta_i)\tr S^{-1} p^\eta_i. $$ Further from the constraint that the momenta lie in the cotangent space to the manifold we have that $$ \jacob c(q_i) p_i = 0 \implies \jacob c(\phi(\theta_i))p_i = \begin{bmatrix} \jacob g(\theta_i) & S \end{bmatrix} \begin{bmatrix}p^\theta_i \\ p^\eta_i\end{bmatrix}= \jacob g(\theta_i) p^\theta_i + S p^\eta_i = 0. $$ Rearranging we therefore have that $$ p^\eta_i = -S^{-1}\jacob g(\theta_i) p^\theta_i, $$ and so $$ \nu_i = p^\theta_i + \jacob g(\theta_i)\tr S^{-1} S^{-1}\jacob g(\theta_i) p^\theta_i = (\idmtx_P + \jacob g(\theta_i)\tr R^{-1}\jacob g(\theta_i)) p^\theta_i = M(\theta_i) p^\theta_i. $$ Substituting into the termination criterion we get the condition $$ p^\theta_I \sum_{i=1}^I M(\theta_i) p^\theta_i < 0. $$ ## References 1. Leimkuhler, B. and Reich, S., 2004. Simulating Hamiltonian dynamics (Vol. 14). *Cambridge University Press*. 2. Betancourt, M.J., 2013. Generalizing the no-U-turn sampler to Riemannian manifolds. *[arXiv:1304.1920](https://arxiv.org/abs/1304.1920)*.