Monte Carlo Estimation of Nonlinear Functionals: Polynomials

###### tags: `mcnf` `monte carlo` `expository` # Monte Carlo Estimation of Nonlinear Functionals: Polynomials **Overview**: In this note, I will sketch the task of estimation of nonlinear functions of expectations, and the difficulties which this poses for conventional Monte Carlo methods. I will then explain how these difficulties can be addressed when the nonlinearity in question is polynomial, and hint at extensions to the general case. ## Principles of Linearity in Monte Carlo As has hopefully been well-detailed in earlier posts, the Monte Carlo method benefits greatly from linear-algebraic structure: one can estimate sums of integrals unbiasedly by estimating each integral unbiasedly, and then taking sums of the estimators. Moreover, studying the variance of these sum-based estimators is typically quite straightforward. Challenges arise when we want to compute nonlinear functions of integrals. For general nonlinear functions $\Phi$, it will not hold that $\mathbf{E} \left[ \Phi \left( X \right)\right] = \Phi \left( \mathbf{E} \left[ X \right] \right)$, and so naive modifications of standard estimators will incur some bias. It bears mentioning that *this may be fine*. Arguably, the 'real' mark of estimator quality might be mean squared error, or the narrowness of confidence intervals, or other more holistic measures of concentration around the truth. Nevertheless, the presence of bias is often symptomatic of other undesirable problems with an estimator, and so it is useful to understand how bias can be mitigated. ## A Simple Example: The Square of an Expectation A useful starting point is to consider one of the simplest nonlinear functions, namely $\Phi(x) = x^2$. One reason for this choice is that it is somehow familiar: a conventional representation for the variance of a random variable is \begin{align} \text{Var} \left[ X \right] &= \mathbf{E} \left[ X^2 \right] - \mathbf{E} \left[ X \right]^2. \end{align} Standard Monte Carlo provides a route to unbiased estimation of $\mathbf{E} \left[ X^2 \right]$, and a first course in Statistics will often teach that the same is possible for $\text{Var} \left[ X \right]$. Hence, by the aforementioned virtues of linearity in Monte Carlo, we must accept that an unbiased estimator for $\mathbf{E} \left[ X \right]^2$ exists! Now, let us try to derive such an estimator by ourselves. Begin by writing \begin{align} \mathbf{E} \left[ X \right]^2 &= \mathbf{E} \left[ X \right] \cdot \mathbf{E} \left[ X \right] \\ &= \left( \int p(dx_1) \, x_1 \right) \cdot \left( \int p(dx_2) \, x_2 \right) \\ &= \int p(dx_1) p(dx_2) \, x_1 x_2 \\ &= \mathbf{E}_{p \otimes p} [ X_1 X_2]. \end{align} This is good news: we can again write our quantity of interest as the integral of a function against a probability measure. The difference is that the probability measure in question is now constructed through considering *multiple copies* of our 'base' probability measure $p$. Now, consider the following estimator 1. Draw samples $(x_1^{(1)}, x_2^{(1)}), \ldots, (x_1^{(N)}, x_2^{(N)}) \sim p \otimes p$. 2. Form the estimator \begin{align} \hat{I}_N = \frac{1}{N} \sum_{i = 1}^N x_1^{(i)} x_2^{(i)}. \end{align} The good news is that this estimator is unbiased for the quantity of interest. The slightly strange news is that it is also very wasteful. Looking closely at the generative process, we have drawn a total of $2N$ samples from $p$, but only used them in a very limited way. Note that any quantity of the form $x_a^{(i)} x_b^{(j)}$ with $(a, i) \neq (b, j)$ is also an unbiased estimator for our quantity of interest, and we have ignored many such terms. In fact, we can improve the quality of our estimator by recycling all of the same information more carefully, along the lines of the following estimator: 1. Draw samples $x^1, \cdots, x^N \sim p$, with $N \geqslant 2$. 2. Form the estimator \begin{align} \hat{I}_N = \frac{1}{N(N-1)} \sum_{i = 1}^N \sum_{j = 1}^N \mathbf{I} \left[ i \neq j \right] x^i x^j. \end{align} The premise of this estimator is that for any $i \neq j$, the quantity $x^i x^j$ is an unbiased estimator of $\mathbf{E} \left[ X \right]^2$, and so we should include all of these terms in our estimator. With some work, one can show that this new estimator has lower variance than our first attempt. It is worth noting that this estimator is distinct from the 'naive' estimator obtained by squaring the sample mean of the $x^i$, which exhibits a systematic upwards bias of $\mathcal{O}(N^{-1})$. A slight drawback of this new estimator appears to be that it now appears to cost $\mathcal{O} \left( N^2 \right)$ to actually assemble the estimator once the samples are drawn. In fact, this is an illusion. Consider the following manipulations: \begin{align} \hat{I}_N &= \frac{1}{N(N-1)} \left( \sum_{i = 1}^N \sum_{j = 1}^N x^i x^j - \sum_{i = 1}^N \left( x^i \right)^2 \right) \\ &= \frac{1}{N(N-1)} \left( \left( \sum_{i = 1}^N x^i\right)^2 - \sum_{i = 1}^N \left( x^i \right)^2 \right) \\ &= \frac{N}{N - 1} \left(\frac{1}{N} \sum_{i = 1}^N x^i\right)^2 - \frac{1}{N-1} \left( \sum_{i = 1}^N \left(x^i\right)^2 \right), \end{align} where both of the two terms can be formed in time $\mathcal{O}(N)$. This is representative of what one can expect in the polynomial case. Note also that in order to form this unbiased estimator of $\mathbf{E} \left[ X \right]^2$, we need to draw at least $2$ samples from $p$. This will be mirrored in the forthcoming extensions. ## Towards General Polynomials of Expectations Consider now estimating a quantity of the form \begin{align} I \left( \mathbf{f}, \alpha \right) = \prod_{m = 1}^M \mathbf{E}[f_m(X)]^{\alpha_m}, \end{align} where all expectations are taken with respect to the same probability measure over $X$, and all $\alpha_m$ are nonnegative integers. A similar derivation to earlier allows us to write that \begin{align} I \left( \mathbf{f}, \alpha \right) = \int \prod_{m = 1}^M \prod_{a = 1}^{\alpha_m} f_m (x_m^a) p (dx_m^a), \end{align} where now the integral is being taken against $|\alpha| := \sum_{m = 1}^M \alpha_m$ copies of $p$. This suggests the following estimator: 1. Fix $N \geqslant |\alpha|$. 2. Draw samples $x^1, \cdots, x^N \sim p$. 3. For all ordered subsets $S$ of $[N]$ of size $|\alpha|$, $S = (a_1, \ldots, a_{|\alpha|})$, partition $S = S_1 \sqcup \cdots \sqcup S_M$, where $|S_m| = \alpha_m$, and form the sub-estimator \begin{align} \hat{I} (x^S)&= \prod_{m = 1}^M \prod_{i \in S_m} f_m (x^{a_i}) \end{align} 4. Form the estimator \begin{align} \hat{I}_N = \frac{1}{N(N-1)\cdots(N - |\alpha|+1)} \sum_{S} \hat{I} (x^S). \end{align} As before, while the naive implementation of this estimator would have a computational cost scaling like $N^{|\alpha|}$, there are typically combinatorial simplifications available which can reduce the cost to $\mathcal{O}(N)$ (though with a constant prefactor which typically grows with $|\alpha|$). ## Conclusion While the unbiased estimation of nonlinear functions of expectations is typically a challenging endeavour, when the nonlinearity in question is polynomial, there is a systematic solution available, which has been detailed above. Moreover, the procedure is in some sense optimal, in terms of making maximal use of the information contained in the samples. In essence, the enabling factor is that products of expectations can be unbiasedly estimated, provided that the expectations themselves can be. It is then somewhat disappointing to learn that for general nonlinearities and expectands, unbiased estimation from a fixed number of samples cannot generally be accomplished. This is related to the fact that from a fixed number of independent $\text{Bernoulli}(p)$ coin flips, the only functions of $p$ which can be unbiasedly estimated are polynomials, which the eager reader is invited to prove for themselves. There are two main strategies for resolving this difficulty which we will consider in upcoming posts. The first is to accept that unbiasedness is a big ask, and to simply aspire to bias mitigation, e.g. to ensure that the bias is of lower order than the Monte Carlo variance. The second is to persist in the pursuit of unbiasedness, but at the cost of drawing a random (potentially unbounded!) number of samples. ---

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.