The idea is to find a Markov Chain whose stationary distribution is the same as the conditional distrbution we want to estimate. Eg. we want to estimate a geometric distribution:
The distribution of the above variable is the same as the stationary distribution of the Markov Chain shown here:
As we have already seen, each successive sample from a Markov chain is highly correlated with the prior state.
This can take long if the initial state was not bad. However in the long run these local correlations disappear. But we will get a large collection of samples. In order to get samples which are not strongly connected to each other, we can every n
th sample.
WebPPL provides an option for MCMC called
'lag'
.Fortunately, it turns out that for any given (condition) distribution we might want to sample from, there is at least one Markov chain with a matching stationary distribution.
To create the necessary transition function, we first create a proposal distribution,
q(x→x′)
. A common option for continuous state spaces is to sample a new state from a multivariate Gaussian centered on the current state.
Once you have defined q
and p(x)
, the target distribution, calculate the ratio A / B
where:
A = p(x′) * q(x′ → x)
&B = p(x) * q(x → x′)
If A/B
is bigger than 1, round it down to 1. Flip a coin with a probability of A/B
of showing heads.
If heads shows up, transition from x
to x′
. Otherwise, stay in the current state x
.
Balance condition is achieved when a Markov chain reaches a stationary state:
p(x′) = ∑xp(x) π(x → x′) where π is the transition distribution.
A stronger condition is the detailed balance condition:
p(x) π(x → x′) = p(x′) π(x′ → x)
It is stronger in the sense that detailed balance condition implies balance condition.
It can be shown that MH algorithm gives a transition distribution π(x → x′) that satisfies detailed balance equation.
Let's look at an example:
Sometimes the MHMC will get stuck if it cannot find states which have a high probability (in other words the ratio A/B
is really small.) For example if we want to find ten numbers randomly sampled from uniform(0,1)
and we use a Gaussian distribution for transition probabilities it will likely get stuck:
The acceptance ratio in this case is 1-2%.Hamiltonian Monte Carlo solves this problem by calcuating the gradient of the posterior with respect to the random choices made by the program
. The book does not explain very well how it works.
It also does not go into detail how particle filter works.
In contrast to non-parametric methods mentioned above, VI is a parametric method. Mean-field variational inference tries to find an optimized set of parameters by approximating the posterior with a product of independent distributions (one for each random choice in the program).
By default, it takes the given arguments of random choices in the program (in this case, the arguments
(0, 20)
and(0, 1)
to the two gaussian random choices used as priors) and replaces with them with free parameters which it then optimizes to bring the resulting distribution as close as possible to the true posterior… the mean-field approximation necessarily fails to capture correlation between variables.
For example it fails to capture the correlation between reflectance
and illumination
.
probabilistic-models-of-cognition