# Lecture 18 Notes: Intro to Probability
These are informal notes that form the basis of lecture 18 in Spring 2023. For more details on the material, refer to Chapter 17 of the book "Mathematics for Computer Science" by Lehman, Leighton and Meyer.
For clearer text, see the website
https://hackmd.io/@vinodnathan/HyQEisNmh
* Algorithms that toss coins to run faster (6.1210 / 6.1220)
* Information theory
* Coding theory (encoding / decoding for fault-tolerance)
* Cryptography (keys need to be random --- probabilistic analysis is its lifeblood)
* Game theory / Gambling games
* Software testing
* Machine learning: applied probability (6.3900 / 6.3720)
* Voting / Margins of error
* Clinical trials / Medical studies / Probability of disease given symptoms / how effective is a drug / what is the chance smoking causes lung cancer?
-- A field that defies intuition.
-- Mark Twain: "Lies, damned lies and statistics."
-- **Suggested solution**: Throw away intuition and fall back to rigorous step-by-step analysis. That's what we will do in this class.
-- Probabilistic intuition is best built incrementally.
**Monty Hall Problem**
*Suppose you’re on a game show, and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say number 1, and the host, who knows what’s behind the doors, opens another door, say number 3, which has a goat. He says to you, “Do you want to pick door number 2?” Is it to your advantage to switch your choice of doors?*
-- Craig. F. Whitaker Columbia, MD
* Based on a game show "Lets make a deal" (LMD) hosted by Monty Hall.
* Three doors (boxes), one of which has a prize and the other two have zonks ("zonk" being the LMD technical term for a worthless item.)
* Contestant (who does not know which box has the prize) picks one of them.
* Monty opens one of the **other** two boxes **which has a zonk**.
* Monty gives the contestant a choice: stick or switch!
* What should the contestant do?
<< Play the game >>
Switching has better odds!
**STEP 1**: Identify the **sample space**, the set of all possible **outcomes**.
An outcome (also called a sample point) consists of all the information about the experiment after it has been performed, including the values of all random choices.
An outcome of the Monty hall experiment consists of (box that contains the car, box that the contestant chose, the box that was revealed).
E.g. (2,1,3): 2 has the car, contestant chose 1, and 3 was revealed (and contains the goat).
(1,2,1) is not a sample point: cannot reveal the box with the car.
(2,1,1) is not a sample point: cannot reveal the box that the contestant picked.
Constructing the sample space via the tree method:

**STEP 2**: Define an **event** of interest. An event is a subset of outcomes.
The event corresponding to the switch strategy winning is $$\{ (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2), (3,2,1) \}$$
This is a subset of half of all possible events.
Does this mean that switch wins exactly half the time? NO! Not all outcomes are equally probable! The missing piece is to assign probabilities to outcomes.
**STEP 3**: Determine **outcome probabilities**.
Def: A probability space is a sample space $S$ and a probability function $\Pr: S \to \mathbb{R}$ s.t.
* for all $s \in S$, $0 \leq \Pr(s)\leq 1$; and
* $\sum_{s\in S} \Pr(s) = 1$.
$\Pr(s)$ is the probability of getting the outcome $s$ as a result of the experiment.
How to compute the outcome probabilities?
First, pin down the assumptions.
* The prize is in each box with probability $1/3$.
* No matter where the prize is, the contestant picks each box with probability $1/3$.
* No matter where the prize is and which box the contestant picks, she picks a box (different from the ones that the contestant picks and the one that has the prize) at random if she has a choice.
Next, assign edge probabilities: these are determined by our assumptions. E.g. at the beginning, the prize is equally likely to be in any of the three boxes, corresponding to edge probabilities of 1/3 each to the leftmost edges; the player is equally likely to choose each box, corresponding to edge probabilities of 1/3 each to the second-level edges; and Monty Hall will reveal a box that is both different from the choice of the contestant *and* contains a zonk (this choice is either unique or is equally likely to be one of two choices, corresponding to edge probabilities of either $1$ or $1/2$).
The probability of an outcome is the product of the probabilities on the edges leading up to the outcome. For example, the probability of outcome (1,1,2) is $1/3\cdot 1/3\cdot 1/2 = 1/18$.

**STEP 4**: Determine the **event probability**.
Let $E$ be an event. The probability of $E$ is simply the sum of probabilities of the outcomes in $E$. That is, $$\Pr(E) = \sum_{s\in E} \Pr(s)$$
$\Pr[\mbox{switch wins}] = 1/9+1/9+1/9+1/9+1/9+1/9 = 6 \cdot 1/9 = 2/3$.
$\Pr[\mbox{switch loses}] = 6 \cdot 1/18 = 1/3$.
Now, before we declare switching to be a winning strategy, we have to ask: what is the probability that "stick" wins? Notice that the probability that sticking wins is the same as the probability that switching loses (i.e. one of the two strategies always wins since the prize is behind one of two doors.) So,
$$\Pr[\mbox{stick wins}] =\Pr[\mbox{switch loses}] = 1/3$$
What if we slightly change the rules of the game? Monty Hall has a preference for smaller numbers (maybe he is closer to door 1 than to door 2, and to door 2 than to door 3, and he really doesn't like walking). So, if he has a choice between doors $a$ and $b$ to open, he opens the smaller of the two.
*Exercise*: Now, is it better to switch or stick?
Turns out it's still better to switch and switch wins with the exact same probability (i.e. 2/3).
However, you will see in the recitation that this is not always the case, e.g. when there are four doors. Best to fall back to the basics--- define outcomes; define the desired event (a subset of outcomes); assign outcome probabilities; and finally, compute the event probability.
**We did not cover the gambling game in the lecture, but please go over the notes regardless. It's good practice on the four-step method!**
**Gambling game / Non-transitive Dice** (first posed by Bradley Efron)

* Player 1 picks a die
* Player 2 (after seeing which die player 1 picked) picks a different die
* both roll
* higher number wins.
**A vs. C**:

Outcomes = $\{2,6,7\} \times \{3,4,8\}$.
Event that $C$ wins = $\{(2,3),(2,4),(2,8),(6,8),(7,8)\}$.
Probability of each outcome = $1/9$.
So, $\Pr[\mbox{C wins}] = 5 \cdot 1/9 = 5/9$.
$\Pr[\mbox{A wins}] = 4 \cdot 1/9 = 4/9$.
Between $A$ and $C$, it is more likely that $C$ wins!
**C vs. B**:

Outcomes = $\{3,4,8\} \times \{1,5,9\}$.
Event that $B$ wins = $\{(3,5),(3,9),(4,5),(4,9),(8,9)\}$.
Probability of each outcome = $1/9$.
So, $\Pr[\mbox{B wins}] = 5 \cdot 1/9 = 5/9$.
$\Pr[\mbox{C wins}] = 4 \cdot 1/9 = 4/9$.
Between $C$ and $B$, it is more likely that $B$ wins!
**B vs. A**:

Outcomes = $\{1,5,9\} \times \{2,6,7\}$.
Event that $A$ wins = $\{(1,2),(1,6),(1,7),(5,6),(5,7)\}$.
Probability of each outcome = $1/9$.
So, $\Pr[\mbox{A wins}] = 5 \cdot 1/9 = 5/9$.
$\Pr[\mbox{B wins}] = 4 \cdot 1/9 = 4/9$.
Between $B$ and $A$, it is more likely that $A$ wins!
$A \succ B \succ C \succ A$. Non-transitive dice!
If you think about it a bit more deeply, there is absolutely no reason for probabilities of this nature to be {\em transitive}! Our intuition fails us all the time, so again, best to stick with the four-step method.
What is weirder is this: if one slightly changes the game so the first party throws her dice twice and adds the outcomes, the second party does the same, and whoever gets the larger sum wins, the situation reverses! That is,
$A \prec B \prec C \prec A$.
The dice continue to be non-transitive, but in the opposite way!
For more intransitive dice fun, see https://arxiv.org/abs/1311.6511
We will end with a definition:
Def: A sample space is **uniform** if every sample point (outcome) has the same probability. so, $$\forall s \in S: \Pr(s) = 1/|S|$$