Fitting Control Variates in MCMC

###### tags: `one-offs` `monte carlo` `variance reduction` `markov chains` # Fitting Control Variates in MCMC **Overview**: In this note, I describe two of the approaches to fitting control variates in Markov Chain Monte Carlo (MCMC), discussing the relative merits of the two. I also comment briefly on the analogous problem for Sequential Monte Carlo. ## The Options When computing integrals using Markov Chain Monte Carlo (MCMC), it is often possible to reduce the variance of an estimator by the use of control variates. Perhaps surprisingly, the procedure of fitting the control variates for a given expectand can be done in a couple of ways, and this is not often discussed. Suppose that the task is to estimate $\int\pi\left(\mathrm{d}x\right)f\left(x\right)$ by simulating a $\pi$-reversible Markov chain $P$, and one knows a priori that $\int\pi\left(\mathrm{d}x\right)h\left(x\right)=0$. One then fits a scalar $\beta$ such that $\tilde{f}:=f-\beta\cdot h$ is less variable than $f$ in a suitable sense. The setting of vector-valued $h$ and $\beta$ is a relatively straightforward extension, which for simplicity, I will not discuss here. Two main strategies for fitting $\beta$ are: 1. Minimising the empirical variance of $\tilde{f}$, i.e. defining \begin{align} m\left(\beta\right) &=\frac{1}{T}\sum_{t\in\left[T\right]}\left\{ f\left(x_{t}\right)-\beta\cdot h\left(x_{t}\right)\right\} \\ s^{2}\left(\beta\right) &=\frac{1}{T-1}\sum_{t\in\left[T\right]}\left(f\left(x_{t}\right)-\beta\cdot h\left(x_{t}\right)-m\left(\beta\right)\right)^{2}, \end{align} find the value of $\beta$ which minimises $s^{2}\left(\beta\right)$. This is a simple least-squares problem, but ignores the Markovian structure of the algorithm. 2. Minimising an estimate of the *asymptotic* variance of $\tilde{f}$, e.g. \begin{align} \pi\left(\tilde{f}\right)\approx\hat{m} &= \frac{1}{T}\sum_{t\in\left[T\right]}\tilde{f}\left(x_{t}\right) \\ \pi\left(\left(\tilde{f}-\pi\left(\tilde{f}\right)\right)\cdot P^{s}\left(\tilde{f}-\pi\left(\tilde{f}\right)\right)\right)\approx\hat{\rho}\left(s\right) &:= \frac{1}{T}\sum_{0<t\leqslant T-s}\left(\tilde{f}\left(x_{t}\right)-\hat{m}\right)\cdot\left(\tilde{f}\left(x_{t+s}\right)-\hat{m}\right) \\ \sigma^{2}\left(\tilde{f}\right)\approx\hat{\sigma^{2}}\left(\tilde{f}\right) &:= \hat{\rho}\left(0\right)+2\cdot\sum_{0<s\leqslant s_{*}}w\left(s\right)\cdot\hat{\rho}\left(s\right), \end{align} where $w$ is some 'windowing' or 'weight' function, typically taking values in $\left[0,1\right]$. This is again a least-squares problem for $\beta$, albeit a slightly more complicated one to form. Still, it has the benefit of acknowledging the autocorrelation structure of the underlying MCMC algorithm, and thus should asymptotically find more accurate solutions. The papers which I have seen indicate that the second strategy is preferable, sometimes substantially. So, there is something to be gained from harnessing the structure of the sampler. A parting comment: it seems to be rare to use control variates in Sequential Monte Carlo (SMC). It is clear that the first strategy adapts to SMC reasonably easily, but an analog of the second (i.e. a structured variance estimator) is more complex; until recently, it was not actually known how to form efficient 'internal' variance estimators for SMC. It would be interesting to see how much benefit there is to these refined estimators in this context.