Control Variates and Controlled Diffusions

###### tags: `sampling` `monte carlo` `variance reduction` `one-offs` `expository` # Control Variates and Controlled Diffusions **Overview**: In this note, I will outline a method for constructing diffusions which admit a desired invariant measure, and have certain connections to control variates of Stein-Langevin type. ## The 'Complete Recipe' for Sampling with Dynamical Systems Consider the task of drawing samples from a probability measure with density given as $\pi(dz) = \pi(z) dz$. It has been shown that any time-homogeneous stochastic process with continuous sample paths which leaves $\pi$ invariant can be written as \begin{align} dz = \left( D(z) + Q(z) \right) \nabla \log \pi (z) dt + \left( \Gamma^D(z) + \Gamma^Q(z) \right) dt + \sqrt{2 D(z)} dW, \end{align} where * $D$ is a positive-semidefinite-matrix-valued function * $Q$ is a skew-symmetric-matrix-valued function * Given a matrix-valued function $M$, $\Gamma^M$ is the vector-valued function with coordinates given by \begin{align} \Gamma^M_i (z) &= \text{div}_z \left( M_{i\cdot} (z) \right) \\ &= \sum_{j = 1}^d \frac{\partial M_{ij}}{\partial z_j}(z). \end{align} In this setup, $D$ prescribes the 'reversible' / 'equilibrium' component of the dynamics, and $Q$ prescribes the 'non-reversible' / 'non-equilibrium' component. I first came across this formulation through a [2015 work](https://arxiv.org/abs/1506.04696) of Ma, Chen, and Fox. After publication, it has been remarked multiple times that decompositions of this form have long been known in the physics literature. A more refined presentation and extension of these ideas was given in this [2021 work](https://arxiv.org/abs/2105.02845) of Barp, Takao, Betancourt, Arnaudon, and Girolami. A takeaway from this formulation is that if one wants to draw samples from some probability measure $\mu$, one can proceed as follows: * Specify an extended target measure $\pi$ which admits $\mu$ as a marginal distribution. * Specify matrix-valued functions $D$ and $Q$, as above. * Simulate the corresponding diffusion. Note that the above formulation is essentially algebraic: the claim is that these processes leave $\pi$ invariant, and there is no claim as to the rate at which they converge to $\pi$ over time (if they indeed do). ## Dynamical Systems with Control Variables Consider sampling from $\pi(x) \propto \exp \left( - V(x) \right)$ by embedding it into an extended target measure $\Pi (x, \xi)$, with log-density specified by \begin{align} -\log \Pi (x, \xi) &=: \mathcal{V} (x, \xi) \\ &= V(x) + \sum_{j = 1}^M S_j \left( \xi_j \right), \end{align} that is, we introduce $M$ additional 'control variables' $\xi_{1:M}$, which are independent from one another and from $x$ at stationarity. It is common to take $S_j (\xi_j) = \frac{1}{2} | \xi_j |^2$, i.e. to give each of the control variables an asymptotically Gaussian law. Now, specify also $M$ 'control potentials' $f_{1:M}$, each mapping $\mathbf{R}^d$ into $\mathbf{R}$, and aggregate them into a vector-valued function as \begin{align} F(x) &= \begin{bmatrix} f_1 (x) \\ f_2 (x) \\ \cdots \\ f_M (x) \end{bmatrix} \in \mathbf{R}^M. \end{align} Write also \begin{align} F' (x) &= \begin{bmatrix} \nabla_x f_1 (x) \\ \nabla_x f_2 (x) \\ \cdots \\ \nabla_x f_M (x) \end{bmatrix} \in \mathbf{R}^{M \times D} \end{align} for the Jacobian of this map. With this specification made, we are now in a position to define $D$ and $Q$ matrices, which we choose to take as \begin{align} D(x, \xi) &= \begin{bmatrix} I_D & 0_{D \times M}\\ 0_{M \times D} & 0_{M \times M} \end{bmatrix} \\ Q(x, \xi) &= \begin{bmatrix} 0_{D \times D} & -F' (x)^T \\ F'(x) & 0_{M \times M} \end{bmatrix}. \end{align} Working through the equations, this gives us the joint dynamics for $(x, \xi)$ as \begin{cases} dx &= - \nabla_x V(x) dt + F'(x)^T \nabla_\xi S (\xi) dt + \sqrt{2} W_t \\ d\xi &= \left( \mathcal{L} F \right) (x) dt, \end{cases} where $\mathcal{L}$ acts on scalar-valued functions by \begin{align} \left( \mathcal{L} f \right) (x) = \langle - \nabla_x V(x), \nabla_x f (x) \rangle + \Delta_x f (x), \end{align} and is extended to vector-valued functions by acting coordinatewise. As such, these dynamics can equally be written out in coordinates as \begin{cases} dx_t [i] &= \left( - \frac{\partial V}{\partial x[i]} (x_t) + \sum_{j = 1}^M \frac{\partial f_j}{\partial x[i]} (x_t) \cdot \frac{\partial S_j}{\partial \xi[j]} (\xi_t [j]) \right) dt + \sqrt{2} W_t[i] \\ d\xi_t [j] &= \left( \mathcal{L} f_j \right) (x_t) dt. \end{cases} I will refer to this system as the $F$-controlled dynamics. Note that when $F = 0$, this is just the standard overdamped Langevin dynamics. ## Interpretation of the $F$-controlled Dynamics One somewhat-heuristic interpretation of these dynamics is as follows: the standard theory of infinitesimal generators of stochastic processes tells us that at stationarity, for each $j$, the quantity $\mathcal{L} f_j$ should have mean value $0$. The corresponding 'control variable' $\xi_j$ can be seen as a thermostat which monitors how close to $0$ the quantity $\mathcal{L} f_j$ is, and feeds this information back into the $x$ dynamics, in order to encourage this quantity to average out to $0$. By the same token, there seems to be some connection here to control variates. Essentially, given the knowledge that a certain quantity $c$ should average out to $0$ at equilibrium, one can consider a sort of moment-matching approach to construct a set of particles such that the empirical average of $c$ across the particles is $0$. This bears some resemblance to strategies like (Kernel) Stein Discrepancy minimisation. The $F$-Controlled dynamics might then be viewed as a hybrid of pure moment-matching with more conventional Monte Carlo sampling strategies. ## $F$-Controlled Dynamics from Control Variates Suppose that we have a quantity $c$ which we know has mean $0$ under $\pi$. Under appropriate regularity conditions (related to the Poisson / Stein equation), it holds that there exists a corresponding potential $f$ such that $\mathcal{L} f = c$. As such, one can (in principle) construct such dynamics directly from a set of control variates. However, it should be noted that the numerical construction of these potentials $f$ is usually quite challenging, limiting the practical applicability of such strategies. ## Conclusion In this note, I have provided a brisk recap of the many ways in which continuous dynamics can be harnessed for the purposes of Monte Carlo sampling, and then applied this theory to describe a particular dynamical system. I have then drawn some links between the proposed system, the method of control variates, and the method of moments. As foreshadowed earlier in the note, it is not so clear how useful these methods will be for accelerating the convergence of sampling methods, either provably or otherwise. The wealth of opportunities provided by the 'complete recipe' is unfortunately relatively silent on the question of which opportunities ought to actually be taken. It is fair to say that this remains a difficult question.