Try   HackMD

Chapter 6: Bayesian data analysis

Bayesian cognitive model and Bayesian data analysis are the same thing under the hood; they just have different contexts.

If the generative model is a hypothesis about a person’s model of the world, then we have a Bayesian cognitive model – the main topic of this book. If the generative model is instead the scientist’s model of how the data are generated, then we have Bayesian data analysis.

Two competing hypotheses

Consider an example of a spinning coin (as opposed to a flipping coin).

  • Scientists believe the probability of heads up is uniform [0,1]
  • People might believe it is the same as flipping an unbiased coin i.e. 0.5

Which hypothesis is correct? How do we design an experiment? A Bayesian way of answering this question is: given some data, what is the probability that it was generated by hypothesis one?

Imagine we ran the “predict the next 10” experiment with 20 participants, and observed the following responses:

var experimentalData = [9,8,7,5,4,5,6,7,9,4,8,7,8,3,9,6,5,7,8,5]

var opts = {method: "rejection", samples: 5000}

// hypothesis one
var observerModel =  Infer(opts, function(){
  var p = uniform(0, 1)
  var coinSpinner = Binomial({n:20, p:p})
  observe(coinSpinner, 15)
  return binomial(p, 10)
})
viz(observerModel)

// hypothesis two
var skepticalModel =  Infer(opts, function(){
  var sameAsFlipping = flip(0.5)
  var p = sameAsFlipping ? 0.5 : uniform(0, 1)
  var coinSpinner = Binomial({n:20, p:p})
  observe(coinSpinner, 15)
  return binomial(p, 10)
})
///
viz(skepticalModel)

var experimentalData = [9,8,7,5,4,5,6,7,9,4,8,7,8,3,9,6,5,7,8,5]

// package the models up in an Object (for ease of reference)
var modelObject = {observerModel: observerModel, skepticalModel: skepticalModel};

var scientistModel = function(){
  var theBetterModel_name = flip(0.5) ? "observerModel" : "skepticalModel"
  var theBetterModel = modelObject[theBetterModel_name]
  map(function(d){ observe(theBetterModel, d) }, experimentalData)
  return {betterModel: theBetterModel_name}
}

var modelPosterior = Infer({method: "enumerate"}, scientistModel)

viz(modelPosterior)

Note that in this case the observed data came from people who tend to support hypothesis two. Hence in the above program's result, hypothesis two will be supported more.


For a given Bayesian model (together with data), there are four conceptually distinct distributions of interest:

  1. The prior distribution over parameters captures our initial state of knowledge (or, our beliefs) about the values that the latent parameters could have.
  2. The posterior distribution over parameters captures what we know about the latent parameters having updated our beliefs with the evidence provided by data.
  3. The prior predictive distribution tells us what data to expect, given our model and our initial beliefs about the parameters. The prior predictive is a distribution over data, and gives the relative probability of different observable outcomes before we have seen any data.
  4. The posterior predictive distribution tells us what data to expect, given the same model we started with, but with beliefs that have been updated by the observed data. The posterior predictive is a distribution over data, and gives the relative probability of different observable outcomes, after some data has been seen.

Posterior prediction and model checking

The posterior predictive distribution describes what data you should expect to see, given the model you’ve assumed and the data you’ve collected so far. If the model is a good description of the data you’ve collected, then the model shouldn’t be surprised if you got the same data by running the experiment again.

It’s natural then to use the posterior predictive distribution to examine the descriptive adequacy of a model. If these predictions do not match the data already seen (i.e., the data used to arrive at the posterior distribution over parameters), the model is descriptively inadequate.

Question: we arrive at the posterior distribution using the data already seen. Then under what circumstances can the posterior predictive distribution different from the distribution of data already seen?

Here is an example of when that can happen:

You collect data from ten students each from two schools whether or not they are likely to help other students.

  • 10/10 students from school one will help other students
  • 0/10 students from school two wilh help other students
  • Prior belief: The number of students willing to help other students is uniformly distributed.

The posterior distribution of parameter will peak around 0.5. But the posterior predictive distribution generated from this posterior will look nothing like the observed data. Thus, the model is descriptively inadequate.


Simple vs Complex Hypotheses

Let's say you toss a coin 20 times and get 7 heads. You want to know if the coin is unbiased (perfectly random) or not. Let's concretely define the terms biased and unbiased.

  • Unbiased coin: p(head) = 0.5
  • Biased coin: p(head) = Uniform(0,1)

p(head) = 0.5 is a simple model. p(head) = Uniform(0,1) is more complex. It means each time a coin is tossed, it will turn up heads with any probability between 0 and 1. All probabilities are equally likely. Note that p(head) = 0.5 is just a subset of this complex model.

Which model do you think is more likely to generate the observed data (7/20 heads)?

One might think it is the biased coin but it is actually the unbiased coin.

var k = 7, n = 20;

var compareModels = function() {

  // binary decision variable for which hypothesis is better
  var x = flip(0.5) ? "simple" : "complex";
  var p = (x == "simple") ? 0.5 : uniform(0, 1);

  observe(Binomial({p: p, n: n}), k);

  return {model: x}
}

var opts = {method: "rejection", samples: 2000};
print("We observed " + k + " successes out of " + n + " attempts")
var modelPosterior = Infer(opts, compareModels);
viz(modelPosterior)

Shouldn’t the more general model always be better? If we’re at a track, and you bet on horse A, and I bet on horse A and B, aren’t I strictly in a better position than you? The answer is no, and the reason has to do with our metric for winning. Intuitively, we don’t care whether your horse won or not, but how much money you win. How much money you win depends on how much money you bet, and the rule is, when we go to track, we have the same amount of money.

In probabilistic models, our money is probabilities. Each model must allocate its probability so that it sums to 1. So my act of betting on horse A and horse B actually requires me to split my money (say, betting 50 / 50 on each). On the other hand, you put all your money on horse A (100 on A, 0 on B). If A wins, you will gain more money because you put more money down.

This idea is called the principle of parsimony or Occam’s razor


Bayes factor and Savage-Dickey method

Bayes factor is simple the ratio A / B where
A = Marginal likelihood of data under the hypothesis in consideration
B = Marginal likelihood of data under all hypotheses

var k = 7, n = 20;

var simpleLikelihood = Math.exp(Binomial({p: 0.5, n: n}).score(k))

var complexModel = Infer({method: "forward", samples: 10000}, function(){
  var p = uniform(0, 1);
  return binomial(p, n)
})
var complexLikelihood = Math.exp(complexModel.score(k))

var bayesFactor_01 = simpleLikelihood / complexLikelihood
bayesFactor_01

As you can see, B can be hard to calculate.

Savage-Dickey method comes to the rescue:

The Bayes factor can also be obtained by considering only the more complex hypothesis. What you do is look at the distribution over the parameter of interest (here, p) at the point of interest (here, p = 0.5). Dividing the probability density of the posterior by the density of the prior (of the parameter at the point of interest) also gives you the Bayes Factor!

var k = 7, n = 20;

var complexModelPrior = Infer({method: "forward", samples: 10000}, function(){
  var p = uniform(0, 1);
  return p
})

var complexModelPosterior = Infer({method: "rejection", samples: 10000}, function(){
  var p = uniform(0, 1);
  observe(Binomial({p: p, n: n}), k);
  return p
})

var savageDickeyDenomenator = expectation(complexModelPrior, function(x){return Math.abs(x-0.5)<0.05})
var savageDickeyNumerator = expectation(complexModelPosterior, function(x){return Math.abs(x-0.5)<0.05})
var savageDickeyRatio = savageDickeyNumerator / savageDickeyDenomenator
print( savageDickeyRatio )

Note that we have approximated the densities by looking at the expectation that p is within 0.05 of the target value p=0.5.

Here we only consider the complex model (which includes the simple model). We only look at the behavior of complex model before and after the data is observed at our point of interest (which was represented earlier by our simple model).


Single Regression

The concepts and ideas explained above are used in another example: Regression using Bayesian inference.

Given the data of a tournament of tug of war which contains wins and strengths of players, we build a model like so:

strength = β0 + β1 * nwins

The observed data of strength is generated from strength like so:

observed strength = N(strength, σ)

Here β0, β1 and σ are unknowns. We define a prior over these parameters, observe the data and get a posterior.

var uniformKernel = function(prevVal) {
  return Uniform({a: prevVal - 0.2, b: prevVal + 0.2});
};

var singleRegression = function(){
  // parameters of a simple linear regression
  var b0 = sample(Uniform({a: -1, b: 1}), {driftKernel: uniformKernel})
  var b1 = sample(Uniform({a: -1, b: 1}), {driftKernel: uniformKernel})
  var sigma = sample(Uniform({a: 0, b: 2}), {driftKernel: uniformKernel})

  map(function(d){

    // linear regression formula
    var predicted_y = b0 + d.nWins*b1

    observe(Gaussian({mu: predicted_y, sigma: sigma}), d.ratingZ)

  }, towData)

  return {b0: b0, b1: b1, sigma: sigma}
}

var nSamples = 2500
var opts = { method: "MCMC", callbacks: [editor.MCMCProgress()],
             samples: nSamples, burn: nSamples/2 }

var posterior = Infer(opts, singleRegression)

As before, we want to find out whether this model is descriptively adequate. This is done by generating samples from this model and comparing them with actual data.

It is seen that the generated data has 91% correlation with the observed data.

tags: probabilistic-models-of-cognition