Reparametrising HMC

# Reparametrising HMC We have a variable $q$ whose distribution has covariance $\Sigma$. You can imagine a Gaussian for simplicity if you'd like. In HMC, the usual choice for the distribution of the auxiliary variable $p$ would be the isotropic Gaussian (with mass matrix $M = I$). To sample efficiently, we would like to either 1. sample a whitened $\tilde q$ with $p \sim N(0,I)$ then transform the samples back to $q$, or 2. sample $q$ directly, with a well-chosen $M$ so that $p \sim N(0,M)$ to produce efficient proposals. In this note, we show that approach 1 gives the same result as approach 2. ## Position variable Let us first establish the relationship between these transformed and non-transformed variables. We can transform our original variable with the following relation $$q = L \tilde q \quad \text{or} \quad \tilde q = L^{-1} q,$$ where $L L^T = \Sigma$ from a Cholesky decomposition or otherwise. The whitened $\tilde q$ should hopefully resemble an isotropic Gaussian. ## Momentum variable Given knowledger of $\Sigma$ or some approximation of it, from Neal's paper on HMC (page 22) we know the best choice for mass matrix is $M = \Sigma^{-1}$. This intuition for this choice is that a small mass in the direction of large variance allows for better exploration (see Stuart's Hilbert space HMC paper). From our previous decomposition of $\Sigma$, we have $\Sigma^{-1} = (LL^T)^{-1} = L^{-T} L^{-1} = L^{-T} (L^{-T})^T$. If you'd like, you can call $C = L^{-T}$ and say that $M = C C^T$. Here, I will stick to just using $L$. Then, the momentum transform reads $$ p = L^{-T} \tilde p \quad or \quad \tilde p = L^T p.$$ ## Target density and gradient We want that both $q$ and $\tilde q$ to maintain the same proportion in the respective target density functions, that is to say $$ \pi(q) =\pi(L \tilde q) \propto \tilde \pi (\tilde q).$$ The constant factor can be ignored as they cancel out in the Hamiltonian (and so acceptance probability) calculations. For a Gaussian target, we have $$-\log \pi(q) = \frac{1}{2} q^T \Sigma^{-1} q = \frac{1}{2} (L^{-1} q)^T (L^{-1}q) = \frac{1}{2} \tilde q^T \tilde q = -\log \tilde \pi(\tilde q).$$ Similarly, we want both gradients to check out. Using basic chain rule, we have $$- \nabla_q \log \pi = - \nabla_q \log \tilde \pi =- \nabla_\tilde q \log \tilde \pi \, \nabla_q \,\tilde q = (L^{-T})(-\nabla_\tilde q \log \tilde \pi)$$ Note the extra $L^{-T}$ term before the whitened target gradient. This means we should not expect the gradient values to match if we naively take the gradient from the abovementioned target. Nevertheless, we do not require them to be the same, as they are not involved in the Metropolis-Hastings decision. We will see that this all works out nicely in the leapfrog integrator. ## Leapfrog Integrator Let us check that in each step of the leapfrog integrator, both approaches agree with each other. We first look at the momentum half-step update \begin{align} p_{1/2} &= p_0 - \frac{\epsilon}{2} \nabla_q \log \pi (q_0) \\ \Rightarrow p_{1/2} &= p_0 - \frac{\epsilon}{2} (L^{-T}) \nabla_{\tilde q} \log \pi (\tilde q_0) \\ \Rightarrow L^T p_{1/2} &= L^T p_0 - \frac{\epsilon}{2} \nabla_{\tilde q} \log \pi (\tilde q_0)\\ \Rightarrow \tilde p_{1/2} &= \tilde p_0 - \frac{\epsilon}{2} \nabla_{\tilde q} \log \pi (\tilde q_0) \end{align} And so, we see that the momentum half-step update in approach 1 implies the momentum half-step update in approach 2. You can reverse the derivation to see that they are equivalent. Next, we consider the position full-step update \begin{align} q_1 &= q_0 + \epsilon \, \Sigma \, p_{1/2} \\ \Rightarrow q_1 &= q_0 + \epsilon \, \Sigma \, L^{-T} \tilde p_{1/2} \\ \Rightarrow L^{-1} q_1 &= L^{-1} q_0 + \epsilon \, L^{-1} \Sigma \, L^{-T} \tilde p_{1/2} \\ \Rightarrow L^{-1} q_1 &= L^{-1} q_0 + \epsilon \, L^{-1}(L L^T) \, L^{-T} \tilde p_{1/2} \\ \Rightarrow \tilde q_1 &= \tilde q_0 + \epsilon \, \tilde p_{1/2}. \\ \end{align} The last momentum half-step is the same as the first. Thus, we have shown that for both approaches, at every step of the leapfrog integrators agree in the sense that we can transform between approaches and recover either parametrisation. ## Bounce operator In the bouncy sampler, there is an additional bounce operator. We label $\lambda := -\nabla_q \log \pi (q)$ and $\tilde \lambda = -\nabla_{\tilde q} \log \tilde \pi (\tilde q)$. From before, we know that $L^T \lambda = \tilde \lambda$. We start from the whitened parametrisation (I find it easier). The reflection operator reads \begin{align} \tilde R(\tilde \lambda, \tilde p) &= \tilde p - 2\frac{\langle \tilde \lambda, \tilde p \rangle}{ \langle \tilde \lambda, \tilde \lambda \rangle} \tilde \lambda \\ &= L^{T} p - 2\frac{\langle L^{T} \lambda, L^{T} p \rangle}{ \langle L^{T} \lambda, L^{T} \lambda \rangle} L^{T} \lambda \\ &= L^{T} \Bigg\{ p - 2\frac{\langle L^{T} \lambda, L^{T} p \rangle}{ \langle L^{T} \lambda, L^{T} \lambda \rangle} \lambda \Bigg\} \\ &= L^{T} \Bigg\{ p - 2\frac{ \lambda^T L L^T p }{ \lambda^T L L^T \lambda} \lambda \Bigg\} \\ &= L^{T} \Bigg\{ p - 2\frac{ \lambda^T \Sigma p }{ \lambda^T \Sigma \lambda} \lambda \Bigg\} \\ &= L^{T} \Bigg\{ p - 2\frac{ \langle \lambda, \Sigma p \rangle}{ \langle \Sigma \lambda, \lambda \rangle} \lambda \Bigg\} \\ &= L^{T} \Bigg\{ p - 2\frac{ \langle \Sigma \lambda, p \rangle}{ \langle \Sigma \lambda, \lambda \rangle} \lambda \Bigg\}.\\ &:= L^{T} R(\lambda, p).\\ \end{align} Note that the penultimate equality is possible since $\Sigma$ is symmetric. The reflection operator produces a momentum (or direction), and we see that $\tilde R = L^T R$ does obey the momentum transformation.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.