# Chapter 6: Bayesian data analysis Bayesian cognitive model and Bayesian data analysis are the same thing under the hood; they just have different contexts. > If the generative model is a hypothesis about a person’s model of the world, then we have a _Bayesian cognitive model_ – the main topic of this book. If the generative model is instead the scientist’s model of how the data are generated, then we have _Bayesian data analysis_. > ### Two competing hypotheses Consider an example of a spinning coin (as opposed to a flipping coin). - Scientists believe the probability of heads up is uniform [0,1] - People _might_ believe it is the same as flipping an unbiased coin i.e. 0.5 Which hypothesis is correct? How do we design an experiment? A Bayesian way of answering this question is: `given some data, what is the probability that it was generated by hypothesis one?` Imagine we ran the “predict the next 10” experiment with 20 participants, and observed the following responses: `var experimentalData = [9,8,7,5,4,5,6,7,9,4,8,7,8,3,9,6,5,7,8,5]` ```javascript var opts = {method: "rejection", samples: 5000} // hypothesis one var observerModel = Infer(opts, function(){ var p = uniform(0, 1) var coinSpinner = Binomial({n:20, p:p}) observe(coinSpinner, 15) return binomial(p, 10) }) viz(observerModel) // hypothesis two var skepticalModel = Infer(opts, function(){ var sameAsFlipping = flip(0.5) var p = sameAsFlipping ? 0.5 : uniform(0, 1) var coinSpinner = Binomial({n:20, p:p}) observe(coinSpinner, 15) return binomial(p, 10) }) /// viz(skepticalModel) var experimentalData = [9,8,7,5,4,5,6,7,9,4,8,7,8,3,9,6,5,7,8,5] // package the models up in an Object (for ease of reference) var modelObject = {observerModel: observerModel, skepticalModel: skepticalModel}; var scientistModel = function(){ var theBetterModel_name = flip(0.5) ? "observerModel" : "skepticalModel" var theBetterModel = modelObject[theBetterModel_name] map(function(d){ observe(theBetterModel, d) }, experimentalData) return {betterModel: theBetterModel_name} } var modelPosterior = Infer({method: "enumerate"}, scientistModel) viz(modelPosterior) ``` Note that in this case the observed data came from people who tend to support hypothesis two. Hence in the above program's result, hypothesis two will be supported more. --- For a given Bayesian model (together with data), there are four conceptually distinct distributions of interest: 1. The __prior distribution__ over parameters captures our initial state of knowledge (or, our beliefs) about the values that the latent parameters could have. 2. The __posterior distribution__ over parameters captures what we know about the latent parameters having updated our beliefs with the evidence provided by data. 3. The __prior predictive distribution__ tells us what data to expect, given our model and our initial beliefs about the parameters. The prior predictive is a distribution over data, and gives the relative probability of different observable outcomes before we have seen any data. 4. The __posterior predictive distribution__ tells us what data to expect, given the same model we started with, but with beliefs that have been updated by the observed data. The posterior predictive is a distribution over data, and gives the relative probability of different observable outcomes, after some data has been seen. ### Posterior prediction and model checking > The posterior predictive distribution describes what data you should expect to see, given the model you’ve assumed and the data you’ve collected so far. If the model is a good description of the data you’ve collected, then the model shouldn’t be surprised if you got the same data by running the experiment again. > > It’s natural then to use the posterior predictive distribution to examine the descriptive adequacy of a model. If these predictions do not match the data _already seen_ (i.e., the data used to arrive at the posterior distribution over parameters), the model is descriptively inadequate. > __Question: we arrive at the posterior distribution using the data _already seen_. Then under what circumstances can the posterior predictive distribution different from the distribution of data _already seen_?__ Here is an example of when that can happen: You collect data from ten students each from two schools whether or not they are likely to help other students. * 10/10 students from school one will help other students * 0/10 students from school two wilh help other students - Prior belief: The number of students willing to help other students is uniformly distributed. The posterior distribution of parameter will peak around 0.5. But the posterior predictive distribution generated from this posterior will look nothing like the observed data. Thus, the model is **descriptively inadequate**. --- ### Simple vs Complex Hypotheses Let's say you toss a coin 20 times and get 7 heads. You want to know if the coin is unbiased (perfectly random) or not. Let's concretely define the terms biased and unbiased. - Unbiased coin: p(head) = 0.5 - Biased coin: p(head) = Uniform(0,1) `p(head) = 0.5` is a simple model. `p(head) = Uniform(0,1)` is more complex. It means **each time** a coin is tossed, it will turn up heads with any probability between 0 and 1. All probabilities are equally likely. Note that `p(head) = 0.5` is just a subset of this complex model. Which model do you think is more likely to generate the observed data (7/20 heads)? One might think it is the biased coin but it is actually the unbiased coin. ```javascript var k = 7, n = 20; var compareModels = function() { // binary decision variable for which hypothesis is better var x = flip(0.5) ? "simple" : "complex"; var p = (x == "simple") ? 0.5 : uniform(0, 1); observe(Binomial({p: p, n: n}), k); return {model: x} } var opts = {method: "rejection", samples: 2000}; print("We observed " + k + " successes out of " + n + " attempts") var modelPosterior = Infer(opts, compareModels); viz(modelPosterior) ``` > Shouldn’t the more general model always be better? If we’re at a track, and you bet on horse A, and I bet on horse A and B, aren’t I strictly in a better position than you? The answer is no, and the reason has to do with our metric for winning. Intuitively, we don’t care whether your horse won or not, but how much money you win. How much money you win depends on how much money you bet, and the rule is, when we go to track, we have the same amount of money. > > In probabilistic models, our money is probabilities. Each model must allocate its probability so that it sums to 1. So my act of betting on horse A and horse B actually requires me to split my money (say, betting 50 / 50 on each). On the other hand, you put all your money on horse A (100 on A, 0 on B). If A wins, you will gain more money because you put more money down. > > This idea is called the principle of parsimony or Occam’s razor > --- ### Bayes factor and Savage-Dickey method Bayes factor is simple the ratio A / B where A = Marginal likelihood of data under the hypothesis in consideration B = Marginal likelihood of data under all hypotheses ```javascript var k = 7, n = 20; var simpleLikelihood = Math.exp(Binomial({p: 0.5, n: n}).score(k)) var complexModel = Infer({method: "forward", samples: 10000}, function(){ var p = uniform(0, 1); return binomial(p, n) }) var complexLikelihood = Math.exp(complexModel.score(k)) var bayesFactor_01 = simpleLikelihood / complexLikelihood bayesFactor_01 ``` As you can see, B can be hard to calculate. Savage-Dickey method comes to the rescue: > The Bayes factor can also be obtained by considering only the more complex hypothesis. What you do is look at the distribution over the parameter of interest (here, p) at the point of interest (here, p = 0.5). Dividing the probability density of the posterior by the density of the prior (of the parameter at the point of interest) also gives you the Bayes Factor! > ```javascript var k = 7, n = 20; var complexModelPrior = Infer({method: "forward", samples: 10000}, function(){ var p = uniform(0, 1); return p }) var complexModelPosterior = Infer({method: "rejection", samples: 10000}, function(){ var p = uniform(0, 1); observe(Binomial({p: p, n: n}), k); return p }) var savageDickeyDenomenator = expectation(complexModelPrior, function(x){return Math.abs(x-0.5)<0.05}) var savageDickeyNumerator = expectation(complexModelPosterior, function(x){return Math.abs(x-0.5)<0.05}) var savageDickeyRatio = savageDickeyNumerator / savageDickeyDenomenator print( savageDickeyRatio ) ``` > Note that we have approximated the densities by looking at the expectation that p is within 0.05 of the target value p=0.5. > Here we only consider the complex model (which includes the simple model). We only look at the behavior of complex model before and after the data is observed at our point of interest (which was represented earlier by our simple model). --- ### Single Regression The concepts and ideas explained above are used in another example: Regression using Bayesian inference. Given the data of a tournament of tug of war which contains wins and strengths of players, we build a model like so: strength = β<sub>0</sub> + β<sub>1</sub> * n<sub>wins</sub> The observed data of strength is generated from `strength` like so: observed strength = N(strength, σ) Here β<sub>0</sub>, β<sub>1</sub> and σ are unknowns. We define a prior over these parameters, observe the data and get a posterior. ```javascript var uniformKernel = function(prevVal) { return Uniform({a: prevVal - 0.2, b: prevVal + 0.2}); }; var singleRegression = function(){ // parameters of a simple linear regression var b0 = sample(Uniform({a: -1, b: 1}), {driftKernel: uniformKernel}) var b1 = sample(Uniform({a: -1, b: 1}), {driftKernel: uniformKernel}) var sigma = sample(Uniform({a: 0, b: 2}), {driftKernel: uniformKernel}) map(function(d){ // linear regression formula var predicted_y = b0 + d.nWins*b1 observe(Gaussian({mu: predicted_y, sigma: sigma}), d.ratingZ) }, towData) return {b0: b0, b1: b1, sigma: sigma} } var nSamples = 2500 var opts = { method: "MCMC", callbacks: [editor.MCMCProgress()], samples: nSamples, burn: nSamples/2 } var posterior = Infer(opts, singleRegression) ``` As before, we want to find out whether this model is descriptively adequate. This is done by generating samples from this model and comparing them with actual data. It is seen that the generated data has 91% correlation with the observed data. ###### tags: `probabilistic-models-of-cognition`