--- tags: introduction to probability, statistics --- # The Law of Large Numbers ([home](https://github.com/alexhkurz/introduction-to-probability/blob/master/README.md) ... [previous](https://hackmd.io/@alexhkurz/S1xs327aL)) **Prerequisites:** You need to understand what the [expected values](https://hackmd.io/@alexhkurz/S1xs327aL) are when rolling two dice. ## Introduction We have seen in the last session that that we need to roll the two dice a long time in order for the experimental values to be almost the same as the expected values. We now want to run more experiments in order to understand better what we mean by "almost" in the sentence above. For this we use the new program [two-dice-2.R](https://github.com/alexhkurz/introduction-to-probability/blob/master/src/two-dice-2.R). But let us first review what we have done in the last session. ## Review from the last session Remember that with the previous program [two-dice.R](https://github.com/alexhkurz/introduction-to-probability/blob/master/src/two-dice.R), we run an experiment as follows. Assign a value, such as 30, to the variable number_of_rolls. Run the program. Interpret the bar chart summarising the outcomes. Then we repeated the experiment many times. We learned that - most of the time the experimental outcomes are different from the expected values; - we have to replace 30 by surprisingly large numbers to get outcomes that are close to the expected values; - we expect 7 to be the most frequent outcome. **Activity/Experiment:** - Open from the previous session the program [two-dice.R](https://github.com/alexhkurz/introduction-to-probability/blob/master/src/two-dice.R) and set `number_of_rolls` to 30. - Run the program 15 times and make a table recording whether 7 is the most frequent value (TRUE) or not (FALSE). For example, if the first time 7 is the most frequent value and the second time it is not, your table will look as follows after the first to runs. | Sample of 30 | Is 7 the most frequent outcome? | |:---:|:---:| |1| TRUE | |2| FALSE | - Make a bar chart with two columns showing how often you got FALSE and TRUE. - How would you estimate the probability that 7 is the most frequent value? Is that compatible with your expectation of 7 as the most frequent outcome? The purpose of this session is exactly to get a better understanding of the last question. Maybe you felt that after a while it got boring to run the program [two-dice.R](https://github.com/alexhkurz/introduction-to-probability/blob/master/src/two-dice.R) "by hand" so many times. And to maintain the list by hand as well. *So what if we had a program that would run these experiments as often as wish automatically*? That is exactly what we are going to do now. ## The new program The program [two-dice-2.R](https://github.com/alexhkurz/introduction-to-probability/blob/master/src/two-dice-2.R) now has two variables that we can set before running it. number_of_rolls <- 30 number_of_samples <- 100 The variable `number_of_rolls` has the same meaning as before: The program will roll the two dice 30 times for us. In fact, if you look further down in the new program, you find that the old program is still part of it. But it is now inside a so-called for-loop for (i in 1:number_of_samples) which tells us that the old program is now executed automatically `number_of_samples`-times. To say this again, the variable `number_of_samples` tells the new program how often to run the old program. With the old program, you could convince yourself that if you roll the two dice 30 times, it is more likely than not, that 7 is not the most frequent value. But how likely is this exactly? For this you would have to run the old program many times and each time to record whether 7 was the most frequent value or not. This is exactly what the program does here: if (two_dice_sorted[[1]][1]==7) { list[i]<-TRUE } else { list[i]<-FALSE } It checks whether $7$ was the most frequent value and records the result in a list. This list is then plotted as a bar chart. **Activity/Experiment:** Run the program and write down in your own words an interpretation of the bar chart. Compare with the previous acticity. ## The experiments **Activity/Experiment:** Download the program [two-dice-2.R](https://github.com/alexhkurz/introduction-to-probability/blob/master/src/two-dice-2.R). Find out how often you need to roll the two dice in order to have (approximately) 50% chance that 7 is the most frequent value. Would you say that at 50% we can say that 7 is reliably the most frequent value? What can we say if we want that 7 is the most frequent value in 90% of cases? **Activity/Experiment:** Use [two-dice-2.R](https://github.com/alexhkurz/introduction-to-probability/blob/master/src/two-dice-2.R) to find out how often you need to roll the two dice in order to have (approximately) 90% chance that 7 is the most frequent value. What percentage would you want to ask for, in order to feel safe to say that 7 is reliably the most frequent value? **Activity/Experiment:** Use [two-dice-2.R](https://github.com/alexhkurz/introduction-to-probability/blob/master/src/two-dice-2.R) to find out how often you need to roll the two dice for 7 to be (approximately) reliably the most frequent value. ... here is missing sth ... **The Law of Large Numbers:** (Informal Version) For large enough numbers, the bar charts of the experimental values and the expected values will be almost the same. The law of large numbers is actually a theorem of mathematics. But the precise statement and proof is beyond the scope of this introduction. But it is important to be aware of just how large the numbers have to be in order to have a reason to expect that experimental and expected values agree. As my esteemed colleague Martin Escardo once quipped: **The Fallacy of Small Numbers:** Some people believe that the Law of Large Numbers also holds for small numbers.