Let
be a scalar function over values.
Let be parameters of a -dimensional Dirichlet distribution, and a draw from it. (the sample from the Dirichlet, , is therefore a probability distribution over outcomes, such that ).
Let be a sample from , that is
It's not diffuclt to see that the marginal distribution of is
I'm interested in estimating
If I could sample from , this would be easy to estimate as a Monte Carlo average.
where
However, I will assume that I can't sample from , only from . So my samples are:
Using importance sampling I can estimate my quantity of interest as follows:
Now I have expressed my quantity of interest as an expectation over , so I can use my samples , as follows:
So, instead of simply averaging over the samples, I weight each sample by the importance weight .
This is a widely used method for estimating expectations under distributions when you only have samples from another distribution. The problem with it, it is a high variance estimator.
Let's call this estimator
In this example, we sample in a two-stage process: first we sample and then we sample . The IS estimator above doesn't exploit this, it completely ignores . We can construct a similar estimator that does take into account:
Based on this, an empirical estimator can be constructed:
Let's call this estimator .
Note that this estimator is very similar to the previous one, except that the importance weights now depend not just on but on also.