probability theory ch4

--- tags: CSE --- ## random variables real-valued functions defined on the sample space, are known as __random variables__. __EXAMPLE__: suppose that our experiment consists of tossing $3$ fair coins. if we let $Y$ denote the number of heads that appear, then $Y$ is a random variable taking on one of the values $0, 1, 2$, and $3$ with respective probabilities - $P\{Y=0\}=P(\{(T,T,T)\})=\frac{1}{8}$ - $P\{Y=1\}=P(\{(H,T,T),(T,H,T),(T,T,H)\})=\frac{3}{8}$ - $P\{Y=2\}=P(\{(H,H,T),(T,H,H),(H,T,H)\})=\frac{3}{8}$ - $P\{Y=3\}=P(\{(H,H,H)\})=\frac{1}{8}$ __EXAMPLE__: three balls are to be randomly selected without replacement from an urn containing $20$ balls numbered $1$ through $20$. If we bet that at least one of the balls that are drawn has a number as large as or larger than $17$, what is the probability that we win the bet? _ans:_ let $X$ be the largest number selected. then the answer is $P\{X=17\}+P\{X=18\}+P\{X=19\}+P\{X=20\}=\frac{1\begin{pmatrix}16\\2\end{pmatrix}}{\begin{pmatrix}20\\3\end{pmatrix}}+\frac{1\begin{pmatrix}17\\2\end{pmatrix}}{\begin{pmatrix}20\\3\end{pmatrix}}+\frac{1\begin{pmatrix}18\\2\end{pmatrix}}{\begin{pmatrix}20\\3\end{pmatrix}}+\frac{1\begin{pmatrix}19\\2\end{pmatrix}}{\begin{pmatrix}20\\3\end{pmatrix}}\approx0.508$ ## discrete random variables a random variable that can take on at most a countable number of possible values is said to be discrete. for a discrete random variable $X$, we define the probability mass function $p(a)$ of $X$ by $p(a)=P\{X=a\}$ since $X$ must take on one of the values $x_i$, we have $\sum^\infty_{i=1}p(x_i)=1$ __EXAMPLE__: the probability mass function of a random variable $X$ is given by $p(i)=c\frac{\lambda^i}{i!},i=0,1,2,...$, where $\lambda$ is some positive value. find $P\{X=0\}$ and $P\{X>2\}$. _ans:_ $p(0)=P\{X=0\}=c$. we know that $\sum^\infty_{i=0}p(i)=1\iff ce^\lambda=1\iff c=e^{-\lambda}$ so $p(0)=P\{X=0\}=e^{-\lambda}$ $P\{X>2\}=1-P\{X\leq2\}=1-P(X=0)-P(X=1)-P(X=2)=1-p(0)-p(1)-p(2)\\=1-e^{-\lambda}-\lambda e^{-\lambda}-\frac{\lambda^2e^{-\lambda}}{2}$ ### probability mass function the probability mass function of a random variable $X$, denoted $p_{X}(x)=P(\{X=x\})$ - the bernoulli random variable $X=\begin{cases}1&,\text{success}\\0&,\text{fail}\end{cases}$ $p_X(k)=\begin{cases}p&,k=1\\1-p&,k=0\end{cases}$ - the binomial random variable $p_X(k)=P(\{X=k\})=\begin{pmatrix}n\\k\end{pmatrix}p^k(1-p)^{n-k},k=0,\cdots, n$ and $\sum_{k=0}^np_X(k)=1$ - the geometric random variable $p_X(k)=(1-p)^{k-1}p,k=1,2,\cdots,$ and $\sum^\infty_{k=1}p_X(k)=\sum^\infty_{k=1}(1-p)^{k-1}p=p\frac{1}{1-(1-p)}=1$ - the possion random variable $p_X(k)=e^{-\lambda}\frac{\lambda^k}{k!}$ and $\sum^\infty_{k=0}p_X(k)=\sum^\infty_{k=0}e^{-\lambda}\frac{\lambda^k}{k!}=e^{-\lambda}e^{\lambda}=1$ ### functions of random variables given a random variable $X$. let $Y$ be another random variable which generate from $X$: $Y=g(X)$ $p_Y(y)=\sum_{\{x\mid g(x)=y\}}p_X(x)$ ## expected value if $X$ is a discrete random variable having a probability mass function $p(x)$, then the __expectation__, or the __expected value__, of $X$, denoted by $E[X]$, is defined by $E[X]=\sum_{x:p(x)>0}xp(x)=\sum_{x:p(x)>0}xP\{X=x\}$ __EXAMPLE__: find $E[X]$, where $X$ is the outcome when we roll a fair die. _ans:_ since $p(1)=p(2)=...=p(6)=\frac16$, $E[X]=\frac16\sum^6_{x=1}x=\frac72$ __EXAMPLE__: let $I$ be an indicator variable for the event $A$ if $I=\begin{cases}1, \text{ if }A \text{ occurs}\\0, \text{ if }A^c \text{ occurs}\end{cases}$ find $E[I]$. _ans:_ $p(1)=P\{I=1\}=P(A)$ $p(0)=P\{I=0\}=P(A^c)$ $E[I]=1\cdot p(1)+0\cdot p(0)=P(A)$ ## expectation of a function of a random variable given a discrete random variable, $X$, along with its probability mass function $p(x)$. suppose we want to compute the expected value of some function of $X$, $g(X)$. how to accomplish? __proposition__: if $X$ is a discrete random variable that take on one of the values $x_i,i\geq1$, with respective probabilities $p(x_i)$, then, for any real-valued function $g$, \\$$E[g(X)]=\sum_i g(x_i)p(x_i)$$ _proof:_ suppose that $y_j,j\geq 1$, represent the different values of $g(x_i),i\geq1$. then grouping all the $g(x_i)$ having the same value gives $\sum_i g(x_i)p(x_i)=\sum_j\sum_{i:g(x_i)=y_j}g(x_i)p(x_i)=\sum_j\sum_{i:g(x_i)=y_j}y_jp(x_i)=\sum_jy_j\sum_{i:g(x_i)=y_j}p(x_i)\\=\sum_jy_j\sum_{i:g(x_i)=y_j}p(x_i)=\sum_jy_jP\{g(X)=y_j\}=E[g(X)]$ another way to do is that, let $Y=g(X)$. since $g(X)$ is also discrete random variable, it has probability mass function, which can be determined from the probability mass function of $X$. once we get that, we can compute $E[g(X)]$. __EXAMPLE__: let $X$ denote a random variable that takes on any of the values $-1,0,1$ with respective probabilities $P\{X=-1\}=0.2,P\{X=0\}=0.5,P\{X=1\}=0.3$ compute $E[X^2]$. _ans:_ $Y=X^2,P\{Y=0\}=0.5,P\{Y=1\}=0.5$ $E[X^2]=E[Y]=0\cdot 0.5+1\cdot 0.5=0.5$ note the result shows that $E[g(X)]\neq g(E[X])$. __EXAMPLE__: Find the expected value of a geometric random variable. _ans:_ $E[X]=\sum^\infty_{k=1}kp(1-p)^{k-1}=\sum^\infty_{k=1}(k-1+1)p(1-p)^{k-1}\\=\sum^\infty_{k=1}(k-1)p(1-p)^{k-1}+\sum^\infty_{k=1}p(1-p)^{k-1}\\=\sum^\infty_{j=0}jp(1-p)^{j}+\sum^\infty_{k=1}p(1-p)^{k-1}\\=\sum^\infty_{j=0}jp(1-p)^{j}+1\\=(1-p)\sum^\infty_{j=0}jp(1-p)^{j-1}+1\\=(1-p)E[X]+1$ $-pE[X]+1=0\iff E[X]=\frac1p$ __EXAMPLE__: a product that is sold seasonally yields a net profit of $b$ dollars for each unit sold and a net loss of $l$ dollars for each unit left unsold when the season ends. the number of units of the product that are ordered at a specific department store during any season is a random variable having probability mass function $p(i), i \geq 0$. if the store must stock this product in advance, determine the number of units the store should stock so as to maximize its expected profit? _ans:_ let $X$ be the number of units ordered and $s$ be the number of units the store stock. then the profit will be $Pt(s)=\begin{cases}bX-(s-X)l,\text{ if } X\leq s\\sb,\text{ if } X>s\end{cases}$ $E[Pt(s)]=\sum_{i=0}^{\infty} Pt(s)|_{X=i}\text{ }p(i)=\sum^s_{i=0}(bi-(s-i)l)p(i)+\sum^\infty_{i=s+1}sbp(i)\\=\sum^s_{i=0}(bip(i)-slp(i)+ilp(i))+sb(1-\sum^s_{i=0}p(i))\\=\sum^s_{i=0}((b+l)i-(b+l)s)p(i)+sb\\=(b+l)\sum^s_{i=0}ip(i)-(b+l)s\sum^s_{i=0}p(i)+sb\\=sb+(b+l)\sum^s_{i=0}(i-s)p(i)$ $E[Pt(s+1)]-E[Pt(s)]=(s+1)b+(b+l)\sum^{s+1}_{i=0}(i-s-1)p(i)-sb-(b+l)\sum^s_{i=0}(i-s)p(i)\\=b-(b+l)\sum^s_{i=0}p(i)$ thus, it's better to stock $s+1$ unit when $\sum^s_{i=0}p(i)<\frac{b}{b+l}$ thus, there exists $s^*$ where $E[P(0)]<...<E[P(s^*)]>E[P(s^*+1)]>E[P(s^*+2)]>...$ __corollary__: if $a,b$ are constants, then $E[aX+b]=aE[X]+b$ _proof:_ $E[aX+b]=\sum_{x:p(x)>0}(ax+b)p(x)=a\sum_{x:p(x)>0}xp(x)+b\sum_{x:p(x)>0}p(x)=aE[X]+b$ ## variance if $X$ is a random variable with mean $E[X]=\mu$, then the variance of $X$, denoted by $\text{Var}(X)$, is defined by $\text{Var}(X)=E[(X-\mu)^2]$ $\text{Var}(X)=E[(X-\mu)^2]=\sum_x(x-\mu)^2p(x)=\sum_xx^2p(x)-2\mu\sum_xxp(x)+\mu^2\sum_x p(x)\\=E[X^2]-2\mu^2+\mu^2=E[X^2]-\mu^2=E[X^2]-(E[X])^2$ __EXAMPLE__: calculate $\text{Var}(X)$ if $X$ represents the outcome when a fair die is rolled. _ans:_ $E[X^2]-E[X]^2=\sum_{i=1}^6i^2\cdot \frac16-(\sum_{i=1}^6i\cdot\frac16)^2=\frac{91}6-\frac{49}4=\frac{35}{12}$ __proposition__: $\text{Var}(aX+b)=a^2\text{Var}(X)$ _proof:_ $Y=aX+b,E[Y]=\mu'=a\mu+b$ $\text{Var}(Y)=E[(Y-\mu')^2]=E[(aX+b-a\mu-b)^2]=E[a^2(X-\mu)^2]=a^2E[(X-\mu)^2]=a^2\text{Var}(X)$ __EXAMPLE__: Find the variance of a geometric random variable. _ans:_ $\text{Var}(X)=E[X^2]-E[X]^2$ $E[X^2]=\sum^\infty_{k=1}k^2p(1-p)^{k-1}=\sum^\infty_{k=1}(k-1+1)^2p(1-p)^{k-1}\\=\sum^\infty_{k=1}(k-1)^2p(1-p)^{k-1}+\sum^\infty_{k=1}2(k-1)p(1-p)^{k-1}+\sum^\infty_{k=1}p(1-p)^{k-1}\\=\sum^\infty_{j=0}j^2p(1-p)^{j}+2\sum^\infty_{j=0}jp(1-p)^{j}+1\\=(1-p)E[X^2]+2(1-p)E[X]+1$ $E[X]=1/p$ $E[X^2]=\frac{2-p}{p^2}$ $\text{Var}(X)=\frac{1-p}{p^2}$ __standard deviation__: $\text{SD}(X)=\sqrt{\text{Var}(X)}$ ## the brenoulli and binomial random variable suppose that a trial, whose outcome can be classified as either a success or a failure is performed. let $X=1$ when the outcome is a success and $X=0$ when it is a failure, then the probability mass function of $X$ is - $p(0)=P\{X=0\}=1-p$ - $p(1)=P\{X=1\}=p$ a random variable $X$ is said to be __Bernoulli random variable__ if its probability mass function is given like above. __EXAMPLE__: a communication system consists of $n$ components, each of which will, independently, function with probability $p$. the total system will be able to operate effectively if at least one-half of its components function. - for what values of $p$ is a $5$-component system more likely to operate effectively than a $3$-component system? - in general, when is a $(2k + 1)$-component system better than a $(2k − 1)$-component system? _ans:_ 1. $5$-component system operate with probability $\begin{pmatrix}5\\3\end{pmatrix}p^3(1-p)^2+\begin{pmatrix}5\\4\end{pmatrix}p^4(1-p)+p^5$. $3$-component system operate with probability $\begin{pmatrix}3\\2\end{pmatrix}p^2(1-p)+p^3$ hence, the $5$ one is better when $\begin{pmatrix}5\\3\end{pmatrix}p^3(1-p)^2+\begin{pmatrix}5\\4\end{pmatrix}p^4(1-p)+p^5>\begin{pmatrix}3\\2\end{pmatrix}p^2(1-p)+p^3\iff3(p-1)^2(2p-1)>0\iff p>\frac12$ 2. let $X$ denote the number of the first $2k-1$ that function. then $P_{2k+1}(\text{effective})=P\{X\geq k+1\}+P\{X=k\}(1-(1-p)^2)+P\{X=k-1\}p^2$ $P_{2k-1}(\text{effective})=P\{X\geq k\}=P\{X=k\}+P\{X\geq k+1\}$ we know $\begin{align}P_{2k+1}(\text{effective})-P_{2k-1}(\text{effective})&=P\{X=k-1\}p^2-(1-p)^2P\{X=k\}\\&=\begin{pmatrix}2k-1\\k-1\end{pmatrix}p^{k-1}(1-p)^kp^2-(1-p)^2\begin{pmatrix}2k-1\\k\end{pmatrix}p^k(1-p)^{k-1}\\&=\begin{pmatrix}2k-1\\k\end{pmatrix}p^k(1-p)^k[p-(1-p)]\\&>0\iff p>\frac12\end{align}$ note that $\begin{pmatrix}2k-1\\k\end{pmatrix}=\begin{pmatrix}2k-1\\k-1\end{pmatrix}$. ### properties of binomial random variables __proposition__: if $X$ is a binomial random variable with parameter $n,p$, $E[X]=np,\text{Var}(X)=np(1-p)$ _proof:_ $E[X^k]=\sum^{n}_{i=0}i^k\begin{pmatrix}n\\i\end{pmatrix}p^i(1-p)^{n-i}$ since $i\begin{pmatrix}n\\i\end{pmatrix}=\frac{n!}{(i-1)!(n-i)!}=\frac{(n-1)!}{(i-1)!(n-i)!}n=n\begin{pmatrix}n-1\\i-1\end{pmatrix}$ $\begin{align}E[X^k]&=np\sum^{n}_{i=0}i^{k-1}\begin{pmatrix}n-1\\i-1\end{pmatrix}p^{i-1}(1-p)^{n-i}\\&=np\sum^{n}_{i=1}i^{k-1}\begin{pmatrix}n-1\\i-1\end{pmatrix}p^{i-1}(1-p)^{n-i}\end{align}$ let $j=i-1$, $\begin{align}E[X^k]&=np\sum^{n-1}_{j=0}(j+1)^{k-1}\begin{pmatrix}n-1\\j\end{pmatrix}p^{j}(1-p)^{n-1-j}\\&=npE[(Y+1)^{k-1}]\end{align}$ where $Y$ is a binomial random variable with parameter $n-1,p$. let $k=1$, $E[X]=np$. $\text{Var}(X)=E[X^2]-(E[X])^2=npE[Y+1]-(np)^2\\=np(E[Y]+E[1])-(np)^2=np[(n-1)p+1]-(np)^2=np(1-p)$ __proposition__: if $X$ is a binomial random variable with parameter $(n,p)$, then as $k$ goes from $0$ to $n$, $P\{X=k\}$ first increases monotonically and then decreases monotonically, reaching its largest value when $k$ is the largest integer less than or equal to $(n+1)p$. _proof:_ $P\{X=k+1\}/P\{X=k\}=\frac{\begin{pmatrix}n\\{k+1}\end{pmatrix}p^{k+1}(1-p)^{n-k-1}}{\begin{pmatrix}n\\k\end{pmatrix}p^{k}(1-p)^{n-k}}=\frac{\begin{pmatrix}n\\k+1\end{pmatrix}p}{\begin{pmatrix}n\\k\end{pmatrix}(1-p)}=\frac{\frac{1}{(k+1)!(n-k-1)!}}{\frac{1}{(n-k)!k!}}\frac{p}{1-p}\\=\frac{n-k}{k+1}\frac{p}{1-p}$ $P\{X=k+1\}\geq P\{X=k\}\iff\frac{(n-k)p}{(k+1)(1-p)}\geq1\iff(n-k)p\geq(k+1)(1-p)\iff np-kp\geq k-kp+1-p\\\iff (n+1)p\geq k+1$ thus, $P\{X=k+1\}$ is the peak where $(n+1)p\geq k+1$ ![](https://i.imgur.com/xdRe72w.png) ### computing the binomial distribution function $P\{X\leq i\}=\sum^i_{k=0}\begin{pmatrix}n\\k\end{pmatrix}p^k(1-p)^{n-k}$ $P\{X=k+1\}=\frac{n-k}{k+1}\frac{p}{1-p}P\{X=k\}$ ## the Poisson random variable __Poisson random variable__: a random variable $X$ is __Poisson random variable__ when $p(i)=P\{X=i\}=e^{-\lambda}\frac{\lambda^i}{i!},i=0,1,2,...$ where $\lambda>0$. $\sum^\infty_{i=0}p(i)=\sum^\infty_{i=0}e^{-\lambda}\frac{\lambda^i}{i!}=e^{-\lambda}e^\lambda=1$ the Poisson random variable may be used as an approximation for a binomial random variable with parameter $(n,p)$ when $n$ is larget and $p$ is small so that $np$ is of moderate size. to show that, let $\lambda=np$, $\begin{align}P\{X=i\}&=\frac{n!}{(n-i)!i!}p^i(1-p)^{n-i}\\&=\frac{n!}{(n-i)!i!}(\frac\lambda n)^i(1-\frac\lambda n)^{n-i}\\&=\frac{n(n-1)\cdots(n-i+1)}{n^i}\frac{\lambda^i}{i!}\frac{(1-\lambda/n)^n}{(1-\lambda/n)^i}\end{align}$ for a large $n$ and moderate $np$ , $\frac{n(n-1)\cdots(n-i+1)}{n^i}\approx1,(1-\frac\lambda n)^n\approx e^{-\lambda},(1-\frac\lambda n)^i\approx1$ $P\{X=i\}\approx \frac{\lambda^i}{i!}e^{-\lambda}$ __EXAMPLE__: consider an experiment that consists of counting the number of $\alpha$ particles given off in $1$ second interval. if we know from past that, on the average, $3.2$ such $\alpha$ particles are given off, what is the good approximation to the probability that no more than $2$ $\alpha$ particles are given off? _ans:_ assume $n$ is the number of radioactive atoms and $p$ is the probability that a atom gives off an $\alpha$ particle. - $n$ is extremely large - $np=\lambda=3.2$ by Poisson probability: - $P\{X=0\}=\frac{3.2^0}{0!}e^{-\lambda}=e^{-\lambda}$ - $P\{X=1\}=\frac{3.2^1}{1!}e^{-\lambda}=3.2e^{-\lambda}$ - $P\{X=1\}=\frac{3.2^2}{2!}e^{-\lambda}=\frac{3.2^2}{2}e^{-\lambda}$ - $P\{X<2\}=P\{X=0\}+P\{X=1\}=e^{-\lambda}+3.2e^{-\lambda}+\frac{3.2^2}{2}e^{-\lambda}$ ### expected value and variance let $X$ be Poisson random variable. - $E[X]=np=\lambda$ - $\text{Var}(X)=np(1-p)=\lambda(1-p)\approx\lambda$ don't forget Poisson random variable approximate a bionomial random variable with large $n$ and small $p$. ### what other problems can Poisson distribution apply on Poisson distribution is well-approximated to the distribution of the number of successes in $n$ independent trials when each trial has probability $p$. in fact, it remains well-approximated when each trials are dependent, provided that their dependence is weak. __EXAMPLE__: suppose that each of $n$ people is equally likely to have any of the $365$ days of the year as birthday. what's the probability that a set of $n$ independent people all have different birthday. (the probability is less than $0.5$ when $n=23$) _ans:_ define $E_{ij}$ is a success trial if that person $i$ and person $j$ have the same birthday where $i\neq j$. $np=\begin{pmatrix}n\\2\end{pmatrix}P\{E_{ij}\}=\begin{pmatrix}n\\2\end{pmatrix}\frac1{365}=\frac{n(n-1)}{730}$ no same birthday $\iff$ no success trial: $P\{X=0\}=e^{-np}$ __EXAMPLE__: calculate the probability that, among then $n$ people, no $3$ of them have the same birthday. obtain a good approximation. _ans:_ define $E_{ijk}$ is a success trial if that person $i,j,k$ have the same birthday. $np=\begin{pmatrix}n\\3\end{pmatrix}P\{E_{ijk}\}=\begin{pmatrix}n\\3\end{pmatrix}\frac{1}{365^2}$ no success trial: $P\{X=0\}=e^{-np}$ for the number of events to occur to approximately have a Poisson distribution, it is __not essential that all the events have the same probability of occurrence__, but only that __all of these probabilities be small__. the following is referred to as the __Poisson paradigm__. __Poisson paradiam__: consider $n$ events, with $p_i$ equal to the probability that event $i$ occurs, $i=1,...,n$. if all $p_i$ are __small__ and the trials are either independent or __dependent wealy__, then the number of these events that occurs approximately has a Poisson distribution with mean $\sum^n_{i=1}p_i$ ### computing the Poisson distribution function let $X$ be a Poisson random variable. $\frac{P\{X=i+1\}}{P\{X=i\}}=\frac{\frac{\lambda^{i+1}}{(i+1)!}e^{-\lambda}}{\frac{\lambda^i}{i!}e^{-\lambda}}=\frac\lambda{i+1}$ start with $P\{X=0\}=e^{-\lambda}$: - $P\{X=0\}=e^{-\lambda}$ - $P\{X=1\}=\frac\lambda{1} P\{X=0\}$ - $P\{X=2\}=\frac\lambda{2} P\{X=1\}$ - ... - $P\{X=i+1\}=\frac\lambda{i} P\{X=i\}$ ## conclusion of expect value and variance - $U\sim \text{Uni}[a,b],E[U]=\frac{a+b}2,\text{Var}(U)=\frac{(b-a)^2}{12}$ - $X\sim \text{Ber}(p),E[X]=p,\text{Var}(X)=p(1-p)$ - $Y\sim \text{Bin}(n,p),E[Y]=np,\text{Var}(Y)=np(1-p)$ - $G\sim \text{Geo}(p),E[G]=\frac1p,\text{Var}(G)=\frac{1-p}{p^2}$ - $Z\sim \text{Poi}(\lambda),E[Z]=\lambda,\text{Var}(Z)=\lambda$ ## joint PMFs of multiple random variables $P_{X,Y}(x,y)=P(\{X=x,Y=y\})=P(\{X=x\}\cap\{Y=y\})$ $P_X(x)=\sum_{y}P_{X,Y}(x,y),P_Y(y)=\sum_{x}P_{X,Y}(x,y)$ ### functions of multiple random variables given two random variable $X,Y$. let $Z=g(X,Y)$. $p_Z(z)=\sum_{(x,y)\mid g(x,y)=z}p_{X,Y}(x,y)$ $E[Z]=\sum_x\sum_yg(x,y)p_{X,Y}(x,y)$ more generally, given $n$ random variable $X_1,\cdots,X_n$. let $Z=g(X_1,\cdots,X_n)$ $p_Z(z)=\sum_{(x_1,\cdots,x_n)\mid g(x_1,\cdots,x_n)=z}p_{X_1,\cdots,X_n}(x_1,\cdots,x_n)$ $E[Z]=\sum_{x_1}\cdots \sum_{x_n}g(x_1,\cdots,x_n)P_{X_1,\cdots,X_n}(x_1,\cdots,x_n)$ furthermore, $E[\sum a_iX_i]=\sum a_iE[X_i]$ _proof:_ for a random variable, let $X(s)$ denote the value of $X$ when $s\in S$ is the outcome of the experiment. $E[X]=\sum_{s\in S}X(s)p(s)$ let $Z=\sum a_iX_i$ $E[Z]=\sum_{s\in S}Z(s)p(s)=\sum_{s\in S}(\sum_i X_i(s))p(s)=\sum_{s\in S}(\sum_i X_i(s)p(s))\\=\sum_i\sum_{s\in S}X_i(s)p(s)=\sum_i E[X_i]$ __EXAMPLE__: Your probability class has $300$ students and each student has probability $1/3$ of getting an A, independent of any other student. What is the mean of $X$. the number of students that get an A? _ans:_ Let $X_i=\begin{cases}1&,\text{if the i-th student gets an A}\\0&,\text{otherwise}\end{cases}$ $X=\sum_{i=1}^{300}X_i$ $E[X]=\sum_{i=1}^{300}E[X_i]=\sum^{300}_{i=1}\frac13=100=np$ __EXAMPLE__: Suppose that $n$ people throw their hats in a box and then each picks one hat at random. (Each hat can be picked by only one person, and each assignment of hats to persons is equally likely.) What is the expected value of $X$, the number of people that get back their own hat? _ans:_ $P(\{X_i=1\})=1/n,P(\{X_i=0\})=1-1/n$ $E[X]=\sum E[X_i]=n\cdot1/n=1$ __EXAMPLE__: suppose that trial $i$ is a success with probability $p_i$, $i=1,\cdots,n$. derive an expression for the variance of the number of successful trials. _ans:_ let $X_i$ be the random variable of trial $i$ where $P(\{X_i=1\})=p_i,P(\{X_i=0\})=1-p_i$. let $X=\sum X_i$ $\text{Var}(X)=E[X^2]-(E[X])^2$ $E[X^2]=E[(\sum_i X_i)(\sum_j X_j)]=E[\sum_i X_i(X_i+\sum_{j\neq i} X_j)]\\=E[\sum_i X_i(X_i+\sum_{j\neq i} X_j)]=E[\sum_i (X_i^2+\sum_{j\neq i} X_iX_j)]\\=E[\sum_i X_i^2+\sum_i\sum_{j\neq i} X_iX_j)]\\=\sum_iE[ X_i^2]+\sum_i\sum_{j\neq i}E[X_iX_j]$ $X_iX_j=\begin{cases}1&,X_i=1\land X_j=1\\0&,\text{otherwise}\end{cases}$ $\sum_i\sum_{j\neq i}E[X_iX_j]\\=\sum_i\sum_{j\neq i}p_ip_j$ thus $E[X^2]=\sum_iE[ X_i^2]+\sum_i\sum_{j\neq i}E[X_iX_j]=\sum_ip_i+\sum_i\sum_{j\neq i}p_ip_j$ now, assume all $p_i=p$ (binomial). $E[X_iX_j]=p^2,E[X^2]=np+n(n-1)p^2,\text{Var}(X)=np+n(n-1)p^2-n^2p^2=np(1-p)$ ### conditioning __condititonal PMF__: the conditional PMF of a random variable $X$, conditioned on a particular event $A$ with $P(A)>0$, is define by $P_{X\mid A}(x)=P(\{X=x\}\mid A)=P(\{X=x\}\cap A)/P(A)$ $P(A)=\sum_x P(\{X=x\}\cap A)$ since $\{X=x\}$ are disjoint in different values of $x$. by that, $\sum_x p_{X\mid A}(x)=1$ furthermore, if $A_i$ are disjoint events that form a partition of the sample space, then $p_X(x)=\sum^n_{i=1}P(A_i)p_{X\mid A_i}(x)$ also, for any event $B$, $p_{X\mid B}(x)=\sum^n_{i=1}P(A_i\mid B)p_{X\mid A_i\cap B}(x)$ __EXAMPLE__: let $X$ be the roll of a fair six-sided die and let $A$ be the event that the roll is an even number. then, by applying the preceding formula, we obtain $p_{X\mid A}(x)=P(\{X=k\}\mid\text{roll is even})=P(\{X=k\}\cap \text{roll is even})/P(\text{roll is even})\\=\begin{cases}(1/6)/(1/2)=1/3&,k=2,4,6\\0&,\text{otherwise}\end{cases}$ __EXAMPLE__: A student will take a certain test repeatedly, up to a maximum of $n$ times, each time with a probability $p$ of passing, independent of the number of previous attempts. What is the PMF of the number of attempts, given that the student passes the test? _ans:_ let $A_m$ be the event that the student passes the test in $m$ attempts (with at most $n$ attempts). $P(A_m)=(1-p)^{m-1}p$ let $A$ be the event that the student passes the test in $m$ attempts (with at most $n$ attempts). $P(A)=\sum_{m=1}^n P(A_m)$ introduce the random variable $X$, which is the number of attempts without the limited. then we know that $P_{X\mid A}(k)=\begin{cases}\frac{(1-p)^{k-1}p}{\sum^n_{m=1}(1-p)^{m-1}p}&, k\in[1,n]\\0&,\text{otherwise}\end{cases}$ #### conditioning one random variable on another let $X,Y$ be two random variables associated with the same experiment. if we know that the value of $Y$ is some particular $y$, this provides partial knowledge about the value of $X$: $P_{X\mid Y}(x\mid y)=P(X=x\mid Y=y)=\frac{P(X=x\bigcap Y=y)}{P(Y=y)}=\frac{P_{X,Y}(x, y)}{P_Y(y)}$ $\sum_xP_{X\mid Y}(x\mid y)=1$ in the same time, the condition formula is able to use: $P_{X,Y}(x,y)=p_Y(y)p_{X\mid Y}(x\mid y)$ or counterpart: $P_{X,Y}(x,y)=p_X(x)p_{Y\mid X}(y\mid x)$ __EXAMPLE__: Consider a transmitter that is sending messages over a computer network. Let us define the following two random variables: $X$: the travel time of a given message, $Y$: the length of the given message. We know the PMF of the travel time of a message that has a given length, and we know the PMF of the message length. We want to find the (unconditional) PMF of the travel time of a message. assume that $P_Y(y)=\begin{cases}5/6&,y=100\\1/6&,y=10000\end{cases}$ and assume that the travel time is $10^{-4}Y$ with probability $1/2$, $10^{-3}Y$ seconds with probability $1/3$, and $10^{-2}$ seconds with probability $1/6$. thus, $P_X(x)=\sum_y P_Y(y)P_{X\mid Y}(x\mid y)$. we obtain $P_X(0.01)=(5/6)\cdot(1/2),P_X(0.1)=(5/6)\cdot(1/3),P_X(1)=(5/6)\cdot(1/6)+(1/6)\cdot(1/2)\\,P_X(10)=(1/6)\cdot(1/3),P_X(100)=(1/6)\cdot(1/6)$ ### conditional expection - $E[X\mid A]=\sum_x xp_{X\mid A}(x)$ - $E[g(X)\mid A]=\sum_x g(x)p_{X\mid A}(x)$ - $E[X\mid Y=y]=\sum_x xp_{X\mid Y}(x\mid y)$ if $A_i$ are disjoint events that form a partition of the sample space, with $P(A_i)>0,\forall i$, then - $\begin{align}E[X]&=\sum_x xp_X(x)\\&=\sum_xx\sum_iP(A_i)p_{X\mid A_i}(x\mid A_i)\\&=\sum_ix\sum_xP(A_i)p_{X\mid A_i}(x\mid A_i)\\&=\sum^n_{i=1}P(A_i)E[X\mid A_i]\end{align}$ furthermore, for any event $B$ with $P(A_i\cap B)>0,\forall i$, - $E[X\mid B]=\sum^n_{i=1}P(A_i\mid B)E[X\mid A_i\cap B]$ - $E[X]=\sum_y P_Y(y)E[X\mid Y=y]$ __EXAMPLE__: Messages transmitted by a computer in Boston through a data network are destined for New York with probability $0.5$, for Chicago with probability $0.3$, and for San Francisco with probability $0.2$. The transit time $X$ of a message is random. Its mean is $0.05$ seconds if it is destined for New York, $0.1$ seconds if it is destined for Chicago, and $0.3$ seconds if it is destined for San Francisco. Then, $E[X]$ is easily calculated using the total expectation theorem as $E[X]=0.5\cdot0.05+0.3\cdot0.1+0.2\cdot0.3=0.115$ ## independence The random variable $X$ is independent of the event $A$ if $P(\{X=x\}\cap A)=P(\{X=x\})P(A)=\frac{p_X(x)P(A)}{P(A)}P(A)=p_{X\mid A}(x)P(A),\forall x$ ### independence of random variable two random variables $X$ and $Y$ are independent if $P_{X,Y}(x,y)=p_X(x)p_Y(y)$ $X$ and $Y$ are said to be conditionally independent if, given a positive probability event $A$ if $P(X=x, Y=y\mid A)=P(X=x\mid A)P(Y=y\mid A)=p_{X\mid A}(x)p_{Y\mid A}(y)=p_{X\mid Y,A}(x\mid y)=p_{X\mid A}(x),\forall x,\forall y$ if two random variables are independent $E[XY]=E[X]E[Y]$. even more $E[g(X)h(Y)]=E[g(X)]E[h(Y)]$ let $X'=X-E[X],Y'=E[Y]$, $\text{var}(X+Y)=\text{var}(X'+Y')=E[(X'+Y')^2]-E[X'+Y']^2=E[(X'+Y')^2]\\=E[X'^2+Y'^2+2X'Y']=E[X'^2]+E[Y'^2]+2E[X']E[Y']=\text{var}(X)+\text{var}(Y)$ the idea of independence can expand to more random variable, $p_{X_1,X_2,\cdots,X_n}(x_1,x_2,\cdots,x_n)=p_{X_1}(x_1)p_{X_2}(x_2)\cdots p_{X_n}(x_n)$. $\text{var}(\sum X_i)=\sum\text{var}(X_i)$