L17-MonteCarloMethods

# L17-MonteCarloMethods > Organization contact [name= [ierosodin](ierosodin@gmail.com)] ###### tags: `deep learning` `學習筆記` ==[Back to Catalog](https://hackmd.io/@ierosodin/Deep_Learning)== * http://www.deeplearningbook.org/contents/monte_carlo.html * Why Sampling? * We may wish to draw samples from a probability distribution for many reasons. Sampling provides a ﬂexible way to approximate many sums and integrals at reduced cost. * Sometimes we use this to provide a signiﬁcant speedup to a costly but tractable sum, as in the case when we subsample the full training cost with minibatches. In other cases, our learning algorithm requires us to approximate an intractable sum or integral, such as the gradient of the log partition function of an undirected model. In many other cases, sampling is actually our goal, in the sense that we want to train a model that can sample from the training distribution. * Monte Carlo Sampling * When a sum or an integral cannot be computed exactly, it is often possible to approximate it using Monte Carlo sampling. * ![](https://i.imgur.com/ZXhIWwP.png) or ![](https://i.imgur.com/BLGzSjP.png) * We can approximate $s$ by drawing $n$ samples $x_1, ... , x_n$ from $p$ and then forming the empirical average * ![](https://i.imgur.com/iUfuOMC.png) * The estimator $\hat s$ is unbiased * ![](https://i.imgur.com/9kGIbSY.png) * If the samples $x_i$ are independently and identically distributed (i.i.d.), $Var[f(x_i)]$ is bounded * consider the variance of $\hat s_n$ as $n$ increases. The variance $Var[s_n]$ decreases and converges to 0 * ![](https://i.imgur.com/Zz0gJN0.png) * The central limit theorem tells us that the distribution of the average, $\hat s_n$, converges to a normal distribution with mean $s$ and variance $\frac{Var[f(x)]}{n}$. This allows us to estimate conﬁdence intervals around the estimate $\hat s_n$, using the cumulative distribution of the normal density. * Importance Sampling * When it is not feasible to sample from $p$, an alternative is to use importance sampling * To approximate the expectation based on a proposal distribution $q(x)$ that is easier to draw samples from than $p(x)$ * ![](https://i.imgur.com/QJwYqP3.png) * ![](https://i.imgur.com/b4q2vXY.png) * ![](https://i.imgur.com/XlAnDMI.png) * It is readily seen that $\hat s_q$ is unbiased irrespective of the choice of $q(x)$ * ![](https://i.imgur.com/twowT6s.png) * The variance of $\hat s_q$ is however highly sensitive to the choice of $q(x)$ * ![](https://i.imgur.com/guGvN9e.png) * ![](https://i.imgur.com/QA8v2dm.png) * When $q(x) \gg p(x)|f(x)|$, importance sampling collects useless samples (summing tiny numbers or zeros). On the other hand, when $q(x) \ll p(x)|f(x)|$, which will happen more rarely, the ratio can be huge. * Biased Importance Sampling * Another approach is to use biased importance sampling, which has the advantage of not requiring normalized $p$ or $q$. * $\tilde p(x)$ is easy to evaluate and $Z_p$ is unknown * ![](https://i.imgur.com/OQNKy2P.png) * We may also wish to use a $q(x)$ with the same property * ![](https://i.imgur.com/E0BOFEe.png) * The importance sampling estimator is then given by * ![](https://i.imgur.com/jBPVHPS.png) * ![](https://i.imgur.com/JvBRiuR.png) * The same set of data $x_i$ can be used to approximate the ratio $\frac{Z_p}{Z_q}$ * ![](https://i.imgur.com/S906XqG.png) * ![](https://i.imgur.com/WhZlK2k.png) * We then arrive at a biased importance estimator * ![](https://i.imgur.com/JlxAPn6.png) * ![](https://i.imgur.com/XIDyFzK.png) * Markov Chain Monte Carlo Methods * Methods that involve drawing samples from Markov chains to perform Monte Carlo estimation *