Lecture 22: Random Variables and Expectation

# Lecture 22: Random Variables and Expectation Readings: Section 19.4, 19.5. A more readable version is online here: https://hackmd.io/@vinodnathan/rJ6Kx7PE2 So far, we saw * random variables: (total) functions from the sample space $S$ to another set (typically, non-negative real numbers). * indicator random variables: functions that map to $\{0,1\}$. * random var $R$ to events $E_x := \{\omega: R(\omega) = x\}$ for any $x \in \mathsf{Range}(R)$. * an event $E$ to an indicator random var $I_E(\omega) = 1$ if $\omega \in E$ and $0$ otherwise. * independence of random variables: given a sample space $S$, for all $x_1 \in \mathsf{Range}(R_1)$ and $x_2 \in \mathsf{Range}(R_2)$, the events $E_{x_1}$ and $E_{x_2}$ are independent. * probability density functions and cumulative mass functions. This lecture is about the notion of expectation of a random variable, and a really cool property called linearity of expectation which makes computing expected values so much easier. **Definition** (Expected value of a random variable, or the average, or the mean) $$\mathsf{Ex}(R) := \sum_{\omega \in S} R(\omega) \cdot \Pr[\omega] $$ From now on, we will assume that our random variables are real-valued and non-negative (the latter for technical reasons.) Example 1: A single throw of a die, sample space $S =\{1,2,3,4,5,6\}$. R is the number that comes up, so $R(\omega) = \omega$. $$\mathsf{Ex}(R) = 1\cdot \frac{1}{6} + 2\cdot \frac{1}{6} + 3\cdot \frac{1}{6} + 4\cdot \frac{1}{6} + 5\cdot \frac{1}{6} + 6\cdot \frac{1}{6} = \frac{7}{2}$$ The expected value of the dice is NOT necessarily the value you ever expect to see! Chalk it up to the weirdness of the language of mathematics. The expectation of a random variable with a uniform distribution on a set $\{a_1,\ldots,a_n\}$ is simply the average of the $a_i$'s. Contrast this with the definition of the median value of a random variable which is actually a value that occurs. Median of a random variable $R$ is a value $m \in \mathsf{Range}(R)$ such that $$ \Pr[R < m] \leq 1/2 \mbox{ and } \Pr[R > m] < 1/2$$ In the dice examples, the median is $4$. I wrote down the definition of the median only to contrast it with the definition of the expected value / mean. We will not use the notion of median much (if at all) from now on. Example 2: The expected value of an indicator random variables is the probability that it takes on the value $1$. If $I_A$ is the indicator random variable for an event $A$, $$\mathsf{Ex}(I_A) = 1 \cdot \Pr[I_A = 1] + 0 \cdot \Pr[I_A = 0] = \Pr[I_A = 1] = \Pr[A]$$ where the latter equality is by the definition of $I_A$. Example 3: Each of three people (student, TA, prof) place an ante (two candies each) guess heads/tails. A coin is tossed. If all three agree or disagree with the coin, all three get their candies back. If two agree with the coin, the pot of 6 candies is split between the two (so, one person loses two candies and the other two gain one each). If one agrees with the coin, she gets the entire pot (so, she gains 4 candies and the other two lose two each.) <<play game>> Is the game fair? Let's see. Here is the tree diagram: ![](https://hackmd.io/_uploads/SJ9lDldE2.jpg) Let $P$ be the random variable denoting the student's payoff. $\mathsf{Ex}(P) = 0$. What if the TA and prof decide beforehand that they are going to collude, i.e. choose opposite guesses always. This changes the probability distribution and consequently, the expected payoff of the student. ![](https://hackmd.io/_uploads/Hy_GvgON3.jpg) Now, $\mathsf{Ex}(P) = -1/2$. What just happened? Intuitively, the effect of the prof and TA always choosing opposite guesses is that the student never has a chance to win alone and pocket the sizeable sum of $4, and that shows up in her diminished payoff of $-1/2$. <<Professor Hermann Chernoff's story: see textbook>> ![](https://hackmd.io/_uploads/ryEQueu42.jpg) Back to math... *A different way to compute the expectation* $$\mathsf{Ex}(R) = \sum_{x \in \mathsf{Range}(R)} x\cdot \Pr[R = x]$$ More efficient when there are lots of outcomes, but fewer values that $R$ can take. Proof: \begin{align*} \mathsf{Ex}(R) & = \sum_{\omega\in S} R(\omega)\cdot \Pr[\omega] \\ & = \sum_{x \in \mathsf{Range}(R)} \sum_{\omega: R(\omega)=x} R(\omega) \cdot \Pr[\omega] \\ & = \sum_{x \in \mathsf{Range}(R)} \sum_{\omega: R(\omega)=x} x \cdot \Pr[\omega] \\ & = \sum_{x \in \mathsf{Range}(R)} x \cdot \sum_{\omega: R(\omega)=x} \Pr[\omega] \\ & = \sum_{x \in \mathsf{Range}(R)} x \cdot \Pr[R=x] \end{align*} which is what we wanted. QED. *Another trick, this time only when the range of $R$ is natural numbers* If $\mathsf{Range}(R) = \mathbb{N}$, $$\mathsf{Ex}(R) = \sum_{i=0}^{\infty} i\cdot \Pr[R=i] = \sum_{i=0}^{\infty} \Pr[R > i]$$ Why is this last equality true? Proof: \begin{align*} \sum_{i=0}^{\infty} \Pr[R > i] & = \Pr[R >0] + \Pr[R > 1] + \Pr[R>2] + \ldots \\ & = (\Pr[R=1] + \Pr[R=2] + \Pr[R=3] + \ldots) + \\& \hspace {1in} (\Pr[R=2] + \Pr[R=3] + \ldots) + \ldots \end{align*} $\Pr[R=i]$ appears $i$ times, so this is exactly computing the expectation. Example 4 (Mean time to failure) Manufacture parts, each part *independently* defective with probability $p$. What is the expected number of parts manufactured by the time you see the first defective one? (counting the defective one) Let $R$ be the random variable that denotes the number of parts manufactured by the time you see the first defective part. $$\mathsf{Ex}(R) = \sum_{i=0}^{\infty} \Pr[R > i] = \sum_{i=0}^\infty (1-p)^i = 1/(1-(1-p)) = 1/p$$ Example 5 (infinite expectations) Want to estimate the expected latency on a communication channel. Strategy: try to measure 100 times and take the average (the so-called "empirical mean"). how well does the empirical mean approximate the real mean / expected value? Say, the latency is $i$ ms with probability $1/i$. the expected latency is then $$ \sum_{i=1}^{\infty} 1/i$$ which diverges. The expectation is unbounded while the empirical mean is finite. **Linearity of Expectation** Thm: For random variables $R_1$ and $R_2$, $$\mathsf{Ex}(R_1 + R_2) = \mathsf{Ex}(R_1) + \mathsf{Ex}(R_2)$$ Compare: For events $A$ and $B$, $\Pr(A \wedge B) = \Pr(A) \cdot \Pr(B)$. Only true when $A$ and $B$ are **independent**. In contrast, linearity of expectation applies to any random variables. No need for independence! Which is what makes it a formidable tool in our arsenal. Pf: \begin{align*} \mathsf{Ex}(R_1 + R_2) & = \sum_{\omega \in S} (R_1+R_2)(\omega) \cdot \Pr[\omega] \\ & = \sum_{\omega \in S} (R_1(\omega)+R_2(\omega)) \cdot \Pr[\omega] \\ & = \sum_{\omega \in S} R_1(\omega) \cdot \Pr[\omega] + \sum_{\omega \in S} R_2(\omega) \cdot \Pr[\omega] \\ & = \mathsf{Ex}(R_1) + \mathsf{Ex}(R_2) \end{align*} Note: Also works for more than two random variables. But need a fixed number of random variables. Example 1: R is the sum of two fair dice. *Write $R$ as a sum of two simpler random variables*. $R = R_1 + R_2$ where $R_1$ is the result of the first die and $R_2$ the result of the second die. Then, $$\mathsf{Ex}(R) = \mathsf{Ex}(R_1 + R_2) = \mathsf{Ex}(R_1) + \mathsf{Ex}(R_2) = \frac{7}{2} + \frac{7}{2} = 7$$ The dice did not have to be independent! The two dice could be perfectly correlated, i.e. if one comes up heads, the other one does too, and the calculation we did above applies just as well! Example 2 (Cellphone check problem) Before entering the final exam room, you check your cellphone by putting into a bag (together with everyone else's phones). When leaving, you pick a random one of the phones from the bag. What is the expected number of people who get their cellphone back? Let $R$ be the number of people who got their phones back. $$\mathsf{Ex}(R) = \sum_{k=0}^n k \cdot \Pr[R=k]$$ But what is $\Pr[R=k]$?? How do we compute it? Fortunately, linearity of expectation comes to the rescue. Let $I_i$ be indicator random variables where $$ I_i = \left\{ \begin{array}{cl} 1 & \mbox{if the $i$-th person gets their phone back} \\ 0 & \mbox{otherwise} \end{array} \right.$$ Now, and it's important you make sure you understand this: $$R = I_1 + I_2 + \ldots + I_n$$ So, $$\mathsf{Ex}(R) = \sum_{i=1}^n \mathsf{Ex}(I_i) = \sum_{i=1}^{n} \Pr[I_i = 1]$$ where the second equality is because the expected value of an *indicator* random variable is just the probability that it is $1$. What is $\Pr[I_i = 1]$? It's the probability that the $i$-th person gets their cellphone back. This is $1/n$. (Why is this the case? Imagine the cellphones being permuted randomly and the $i$-th person gets the $i$-th cellphone in the permuted sequence. Then, the chance that anyone gets their phone back is $1/n$.) So, $$\mathsf{Ex}(R) = n\cdot 1/n = 1$$ which is our answer. QED. ========= Incidentally, it turns out that $$\Pr[R=k] = \frac{1}{k!} \sum_{i=0}^{n-k} \frac{(-1)^i}{i!}$$ which is a nasty expression. Using this, we can compute the expectation, but it's a rather unpleasant calculation. Linearity of expectation makes it all so much easier. Plus it's much more general in that it deals with random variables which are correlated in arbitrary ways. ========= Now, let's change the game a bit. $n$ of us go to [MuLan](https://mulantaiwancambridge.com/), place our cellphones on the lazy suzan in the middle of the table and give it an energetic spin. What's the expected number of us that get their phone back in front of us? Either we all get our phones back (which happens with probability $1/n$), or none of us do (which happens with probability $1-1/n$). So, if $R$ is the random variable as above, that counts the number of us that got our phones back in front of us, then $$\mathsf{Ex}(R) = n\cdot \frac{1}{n} + 0 \cdot (1-\frac{1}{n}) = n\cdot \frac{1}{n} = 1$$ That was simple enough, but the point is that the exact same calculation that we did above would've given us the answer here as well, even though the random variables $I_i$ are *very* not independent. (Note that in the cellphones in a bag game as well, the $I_i$ were not independent, but "less so" than the lazy suzan game.)