--- title: MS4 tags: teach:MS --- # Chapter 4 [Back home](https://hackmd.io/myN1AJZMRxWuw6xw0VnTeQ) ### Key ideas in this chapter - Definitions of expectation, variance, covariance. - Calculation of expectation using its definition. - Expectation is a linear operator. This property allows simpler calculations. - Definitions of variance and covariance. - Markov, Chebyshev inequalities. - Moment generating functions. ## 4.1 The expected value of a random variable ### Definition If $X$ is a discrete random variable with frequency function $p(x)$, the expected value (or simply expectation) of $X$, denoted by $E(X)$, is $$E(X) = \sum_i x_i p(x_i),$$ provided that $\sum_i |x_i|p(x_i)<\infty$. If the sum diverges, the expectation is undefined. ### Example Throw a fair die. If $x$ points appears, you receive $x$ dollars. What is the expected dollars you will receive? #### Sol. Let $X$ denote the dollar recieved. The frequency function of $X$: $P(X=x)=1/6$, $x=1,2,3,4,5,6.$ $$E[X] = 1\times \frac{1}{6}+\cdots+ 6\times \frac{1}{6}=21/6 = 3.5.$$ You expected dollars recieved is 3.5. ### Example Find the expected value of $X\sim Bernoulli(p)$. #### Sol. Because $P(X=0)=1-p$, $P(X=1)=p$, we have $$E(X) = 1*p + 0(1-p) = p.$$ :dog: Homework in p 116: Examples A, B, C. ### Definition If $X$ is a continuous random variable with density $f(x)$, then $$E(X) = \int_{-\infty}^\infty xf(x)dx$$ provided that $\int |x|f(x)dx<\infty$. If the integral diverges, the expectation is undefined. :dog: Homework in p 118: Examples E, F, G. ### Markov Inequality If $X$ is a random variable with $P(X\geq 0)=1$ and for which $E(X)$ exists, then $$P(X\geq t)\leq \frac{E(X)}{t}.$$ #### Proof. Note that \begin{eqnarray*} E(X) = \int_{-\infty}^\infty xf(x)dx & = & \int_0^t xf(x)dx + \int_{t}^\infty xf(x)dx \\ & \geq & \int_0^t 0f(x)dx + \int_{t}^\infty t f(x)dx\\ & = & 0 + t\int_{t}^\infty f(x)dx\\ &=& tP(X\geq t). \end{eqnarray*} Hence, $P(X \geq t )\leq \frac{E(X)}{t}$. ### Example $X\sim Exp(1)$, calculate $P(X>3)$ and estimate it using Markov Inequality? #### Sol. $P(X>3) = exp^{-3\times 1} = e^{-3} \approx 0.0498$ With Markov inequality, $$P(X>3) \leq \frac{1}{3} \approx 0.33 $$ ### Expectations of functions of random variable. Suppose $Y=g(X)$. - If $X$ is discrete with frequency function $p(x)$, then $$E(Y) = \sum_x g(x)p(x)$$ provided that $\sum |g(x)|p(x) < \infty$. - If $X$ is continuous with density function $f(x)$, then $$E(Y) = \int_{-\infty}^{\infty} g(x)f(x)dx$$ provided that $\int |g(x)|f(x)dx < \infty$. Suppose that $X_1,\ldots,X_n$ are jointly distributed random variables and $Y= g(X_1,\ldots,X_n)$. - If the $X_i$ are discrete with frequency function $p(x_1,\ldots,x_n)$, then $$E(Y) = \sum_{x_1,\ldots,x_n} g(x_1,\ldots,x_n)p(x_1,\ldots,x_n)$$ provided that $\sum |g(x_1,\ldots,x_n)|p(x_1,\ldots,x_n) < \infty$. - If $X_i$ are continuous with joint density function $f(x_1,\ldots,x_n)$, then $$E(Y) = \int_{-\infty}^{\infty}\cdots \int_{-\infty}^{\infty} g(x_1,\ldots,x_n)f(x_1,\ldots,x_n)dx_1\cdots dx_n$$ provided that $\int |g(x_1,\ldots,x_n)|f(x_1,\ldots,x_n)dx_1\cdots dx_n < \infty$. ### Corollary in page 124. If $X$ and $Y$ are independent random variables and $g$ and $h$ are fixed functions, then $$E[g(X)h(Y)] = E[g(X)]E[h(Y)],$$ provided that the expectations on the right-hand side exist. Proof. Without loss of generality, assume that $X$ and $Y$ are discrete with joint frequency, $p_{X,Y}(x,y)$, and $X$ and $Y$ have marginal frequencies $p_X(x)$ and $p_Y(y)$. Because $X$ and $Y$ are independent, $p_{X,Y}(x,y)=p_X(x)p_Y(y)$. Therefore, we have \begin{eqnarray*} E(g(X)h(Y)) &=& \sum_y \sum_x g(x)h(y) p_{X,Y}(x,y)\\ &=& \sum_y \sum_x g(x)h(y) p_{X}(x)p_Y(y)\\ &=& \sum_y \left(\sum_x g(x)h(y) p_{X}(x)p_Y(y)\right)\\ &=& \sum_y h(y)p_Y(y) \left(\sum_x g(x) p_{X}(x)\right)\\ &=& \sum_y h(y)p_Y(y) E(g(X))\\ &=& E(h(Y)) E(g(X)). \end{eqnarray*} ### Theorem in page 125. If $X_1,\ldots,X_n$ are jointly distributed random variables with expectation $E(X_i)$ and $Y$ is a linear function of the $X_i$, $Y = a+\sum_{i=1}^n b_iX_i$, then $$E(Y) = a + \sum_{i=1}^n b_i E(X_i).$$ Proof. Assume they are continuous wiht joint density $f(x_1,\ldots,x_n)$. Then, we have \begin{eqnarray*} E(Y) &=& \int\cdots\int (a+\sum b_i x_i)f(x_1,\ldots,x_n)dx_1\cdots dx_n\\ &=& a\int\cdots\int f(x_1,\ldots,x_n)dx_1\cdots dx_n\\ &&+ \sum b_i \int\cdots\int x_if(x_1,\ldots,x_n)dx_1\cdots dx_n\\ &=& a+\sum b_i E(X_i). \end{eqnarray*} ### Linear Operator An operator $L$ is said to be linear, if, for every pairs of functions $f$ and $g$, and a scalar $a$, we have - $L(f+g) = L(f)+L(g)$, - $L(a f)= aL(f)$. The expectation is a linear operator: - $E[X+Y]= E[X]+E[Y]$. - $E[cX]=cE[X]$ for a scalar $c$. - $E[a+X]=a+E[X]$ for a scalar $a$. Exam 1 covers all materials above this line. :sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower: Exam 2 covers all materials below. ## 4.2 Variance and Standard Deviation ### Variance - If $X$ is a random variable with expected value $E(X)$, the variance of $X$ is $$Var(X)= E\left[(X-E(X))^2\right],$$ provided that the expectation exists. - The standard deviation of $X$ is the square root of the variance, $$SD(X)=\sqrt{Var(X)}.$$ ### Example Find the variance of the Bernoulli distribution. #### Sol. If $X\sim Bernoulli(p)$, it has been known that $E(X)=p$. Thus, the variance of $X$ is \begin{eqnarray*} Var(X)& = & E(X-p)^2\\ & = & (1-p)^2 \times p + (0-p)^2\times (1-p)\\ & = & p(1-p)((1-p) + p) \\ &= & p(1-p). \end{eqnarray*} ### :bear: Readings in p 132: Example B. ### :+1: Variance is not a linear operator If $Var(X)$ exists and $Y = a+bX$, then $Var(Y) = b^2 Var(X)$. #### Proof. \begin{eqnarray*} Var(Y) = E(Y-\mu_Y)^2 &=& E\left( (a+bX)-(a+b\mu_X)\right)^2\\ &=& E\left(b(X-\mu_X)\right)^2\\ &=& E(b^2 (X-\mu_X)^2)\\ &=& b^2 E(X-\mu_X)^2\\ &=& b^2 Var(X). \end{eqnarray*}Therefore, variance is not a linear operator. ### :+1: Theorem in p 132. The variance of $X$, if it exists, may also be calculated as follows: $$Var(X) = E(X^2)-E(X)^2.$$ #### Proof Let $\mu$ denote $E(X)$. This follows directly from \begin{eqnarray*} Var(X)& = &E(X-\mu)^2 \\ &= &E(X^2 - 2X \mu + \mu^2)\\ & =& E(X^2) - 2\mu E(X) + \mu^2\\ & =& E(X^2) - 2\mu^2 + \mu^2 \\ &=& E(X^2) -\mu^2. \end{eqnarray*} ### :+1: Example Find the variance of the uniform distribution, U(0,1). #### Solution Because $$E(X^2) = \int_0^1 x^2 \times 1 dx = \frac{1}{3},$$ and $$E(X)=\int_{0}^1 x\times 1 dx = \frac{1}{2},$$ we have $$Var(X) = E(X^2)-E(X)^2 = \frac{1}{3}-\left(\frac{1}{2}\right)^2=\frac{1}{12}.$$ ### :+1: Chebyshev's Inequality Let $X$ be a random variable with mean $\mu$ and variance $\sigma^2$. Then, for any $t>0$, $$P(|X-\mu|\geq t)\leq \frac{\sigma^2}{t^2}.$$ #### Proof Let $Y=(X-\mu)^2$, then $P(Y\geq 0)=1$, and we can use Markov Inequality. In addition, it is easy to see that $$E(Y) =E[(X-\mu)^2] = \sigma^2. $$ By Markov inequality, we have $$P(|X-\mu|\geq t)=P(|X-\mu|^2 \geq t^2) = P(Y\geq t^2)\leq \frac{E(Y)}{t^2}=\frac{\sigma^2}{t^2}. $$ ### Corollary A (p 134) If $Var(X)=0$, then $P(X=\mu)=1$. #### Proof Given $\varepsilon\geq 0$, then by Chebyshev's inequality, $$P(|X-\mu|\geq \varepsilon)\leq \frac{\sigma^2}{\varepsilon^2}=0.$$ Therefore, we have $P(|X-\mu|\geq \varepsilon)=0$, and hence $P(X=\mu)=1$. ### Theorem A (p 136) Let the mean squared error $MSE$ be $MSE = E[(X-x_0)^2].$ Then, $$MSE =\beta^2+\sigma^2,$$ where $\beta = E(X-x_0)$ is the bias between $X$ and $x_0$, and $\sigma^2$ is the variance of $X$. #### Proof Let $\mu=E(X)$, then $\beta =E(X-x_0)=\mu-x_0$. We have \begin{eqnarray*} MSE=E(X-x_0)^2 &=& E\left((X-\mu)+(\mu-x_0)\right)^2\\ &=& E((X-\mu)^2 +2(X-\mu)(\mu-x_0)+(\mu-x_0)^2)\\ &=& E(X-\mu)^2 + 2(\mu-x_0)E(X-\mu)+(\mu-x_0)^2\\ &=& \sigma^2 + 0 + (\mu-x_0)^2\\ &=& \sigma^2 + \beta^2. \end{eqnarray*} ### :apple: Exercise C4.2: 48 ## 4.3 Covariance and Correlation ### Definition in p 138. If $X$ and $Y$ are jointly distributed random variables with expectations $\mu_X$ and $\mu_Y$, respectively, the covariance of $X$ and $Y$ is $$Cov(X,Y) = E[(X-\mu_X)(Y-\mu_Y)],$$ provided that the expectation exists. ### A useful formula for calculating the covariance Show that $$Cov(X,Y) = E(XY)-\mu_X \mu_Y.$$ #### Proof \begin{eqnarray*} Cov(X,Y)&=& E((X-\mu_X)(Y-\mu_Y))\\ &=& E(XY - \mu_X Y - \mu_Y X +\mu_X\mu_Y)\\ &=& E(XY) -\mu_X EY -\mu_Y EX + \mu_X \mu_Y \\ &=& E(XY)- \mu_X\mu_Y. \end{eqnarray*} ### Example The joint density of $X$ and $Y$ is $f(x,y)=2x+2y-4xy$, where $0\leq x\leq 1$ and $0\leq y \leq 1$. Find the covariance and correlation of $X$ and $Y$. #### Solution To find the covariance, note $Cov(X,Y) = E(XY)-E(X)E(Y)$. Hence, we first calculate $E(XY)$. \begin{eqnarray*} E(XY) &=& \int_0^1 \int_0^1 xy(2x+2y-4xy)dxdy\\ &=& \int_0^1 \int_0^1 (2x^2y +2xy^2 -4x^2y^2)dxdy\\ &=& \int_{0}^1 \left[ \frac{2}{3}x^3y + x^2y^2 -\frac{4}{3}x^3y^2\right]_{x=0}^{x=1} dy\\ &=& \int_{0}^1 \frac{2}{3}y +y^2 - \frac{4}{3}y^2 dy \\ &=& \int_{0}^1 \frac{2}{3}y - \frac{1}{3}y^2 dy \\ &=& \left[\frac{1}{3}y^2 - \frac{1}{9}y^3\right]^{y=1}_{y=0}= \frac{2}{9}. \end{eqnarray*} Now, to calculate $E(X)$, we find the marginal density of $X$: \begin{eqnarray*} f_X(x) &=& \int_{0}^1 (2x+2y-4xy) dy \\ &=& \left[2xy + y^2 - 2xy^2\right]_{y=0}^{y=1}\\ &=& 2x +1 -2x = 1,\quad \mbox{for}\quad 0<x<1. \end{eqnarray*}Hence, $X\sim Unif(0,1)$, and $E(X)=\frac{1}{2}$ and $Var(X) = \frac{1}{12}$. Similarly, $Y$'s marginal pdf is: \begin{eqnarray*} f_Y(y) &=& \int_{0}^1 (2x+2y-4xy) dx \\ &=& \left[x^2 + 2xy - 2x^2y\right]_{x=0}^{x=1}\\ &=& 1 +2y -2y = 1,\quad \mbox{for}\quad 0<y<1. \end{eqnarray*}Hence, $Y\sim Unif(0,1)$, and $E(Y)=\frac{1}{2}$ and $Var(Y) = \frac{1}{12}$. As a result, we have $$Cov(X,Y) = E(XY) - E(X)E(Y) = \frac{2}{9}-\frac{1}{2}\times\frac{1}{2} = -\frac{1}{36}.$$ The correlation is $$Corr(X,Y)=\frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}=\frac{-\frac{-1}{36}}{\frac{1}{12}} = -\frac{1}{3}.$$ ### A generalization of the covariance formula Let $W,X,Y,Z$ be random variables and $a,b,c,d$ be scalars. Then, we have \begin{eqnarray*} &&Cov(aW+bX, cY+dZ)\\ &=&E(aW+bX)(cY+dZ)-E(aW+bX)E(cY+dZ)\\ &=&E(acWY+adWZ+bcXY+bdXZ)-(aE(W)+bE(X))(cE(Y)+dE(Z))\\ &=&ac(E(WY)-E(W)E(Y))+bc(E(XY)-E(X)E(Y))+ad(E(WZ)-E(W)E(Z)) + bd(E(XZ)-E(X)E(Z))\\ &=& ac Cov(W,Y)+bcCov(X,Y)+adCov(W,Z)+bdCov(X,Z). \end{eqnarray*} ### Exercise 4.3.44 If $X$ and $Y$ are independent random variables with equal variances, find $Cov(X+Y, X-Y)$. #### Solution $$Cov(X+Y, X-Y) = Cov(X,X)-Cov(X,Y)+Cov(Y,X)-Cov(Y,Y)=Var(X)-Var(Y) =0. $$ ### Theorem A (p 140) Suppose $U = a+\sum_{i=1}^n b_i X_i$ and $V = c+\sum_{j=1}^m d_j Y_j$. Then $$Cov(U,V) = \sum_{i=1}^n \sum_{j=1}^m b_id_j Cov(X_i,Y_j).$$ #### Solution By the linearity of the expectation, it is known that \begin{eqnarray*} E(U) & = & a+ \sum_{i=1}^n b_i \mu_{X_i},\\ E(V) & = & c+\sum_{j=1}^m d_j\mu_{Y_j}. \end{eqnarray*} \begin{eqnarray*} Cov(U,V) & = & E\left[(U-E(U) )(V-E(V))\right]\\ &=& E\left[ \left(\sum_{i=1}^n b_i(X_i-\mu_{X_i}) \right)\left(\sum_{j=1}^m d_j(Y_j-\mu_{Y_j})\right)\right]\\ & = & E\left[ \sum_{i=1}^n\sum_{j=1}^m b_id_j(X_i-\mu_{X_i})(Y_j-\mu_{Y_j})\right]\\ &=& \sum_{i=1}^n\sum_{j=1}^m b_id_j E\left[(X_i-\mu_{X_i})(Y_j-\mu_{Y_j})\right]\\ &=& \sum_{i=1}^n\sum_{j=1}^m b_id_j Cov(X_i,Y_j). \end{eqnarray*} ### Corollary A. Show that $$Var(a+\sum_{i=1}^n b_iX_i) = \sum_{i=1}^n \sum_{j=1}^n b_i b_j Cov(X_i, X_j).$$ #### Proof. \begin{eqnarray*} Var(a+\sum_{i=1}^n b_iX_i) & = &Cov(a+\sum_{i=1}^n b_iX_i, a+\sum_{i=1}^n b_iX_i)\\ &=& \sum_{i=1}^n\sum_{j=1}^n b_i b_j Cov(X_i,X_j)\\ &=& \sum_{i=1}^n b_i^2 Var(X_i)+2\sum_{i<j }b_i b_j Cov(X_i,X_j). \end{eqnarray*} ### A simplification of the variance formula $Var(\sum_{i=1}^n X_i) = \sum_{i=1}^{n}Var(X_i)$, if the $X_i$ are independent. #### Solution Note if $X$ and $Y$ are independent, $$Cov(X,Y)=E(X,Y)-E(X)E(Y) = E(X)E(Y)-E(X)E(Y)=0.$$ Therefore, if $X_i$'s are independent, \begin{eqnarray*} Var(\sum_{i=1}^n X_i) &=& \sum_{i=1}^n Var(X_i)+2\sum_{i<j }Cov(X_i,X_j)\\ &=& \sum_{i=1}^n Var(X_i). \end{eqnarray*} ### Example Find the variance of a binomial random variable. #### Solution Because when $Y\sim Binomial(n,p)$, $Y=X_1+\cdots+X_n$, where $X_i\stackrel{i.i.d.}{\sim}Bernoulli(p)$. Therefore, we have $$Var(Y) = \sum_{i=1}^n Var(X_i) = n p(1-p).$$ ### Readings in p 140: Example c. ### Definition If $X$ and $Y$ are jointly distributed random variables and the variances and covariances of both $X$ and $Y$ exist and the variances are nonzero, then the correlation of $X$ and $Y$, denoted by $\rho$, is $$\rho =\frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}.$$ ### Exercise C4.3: 46 Let $U$ and $V$ be independent random variables with means $\mu$ and variances $\sigma^2$. Let $Z=\alpha U+ V\sqrt{1-\alpha^2}$. Find $E(Z)$ and $\rho_{UZ}$. #### Solution By linearity of the expectation, $$E(Z)=\alpha \mu + \mu\sqrt{1-\alpha^2}. $$ To find $\rho_{UZ}$, we first calculate \begin{align*} E(UZ)&=E\left(U(\alpha U + V\sqrt{1-\alpha^2})\right)\\ & = \alpha E(U^2) + \sqrt{1-\alpha^2} E(UV)\\ & = \alpha( \sigma^2 +\mu^2)+\sqrt{1-\alpha^2}\mu^2\\ & =\mu^2 (\alpha+\sqrt{1-\alpha^2} )+\alpha\sigma^2. \end{align*} Hence, \begin{align*} Cov(U,Z) &= E(UZ)-E(U)E(Z)\\ &= \mu^2(\alpha+\sqrt{1-\alpha^2})+\alpha\sigma^2 -\mu(\alpha \mu + \mu\sqrt{1-\alpha^2})\\ &=\alpha\sigma^2˙ \end{align*} In addition, we have \begin{align*} Var(Z) & =Var(\alpha U+ V\sqrt{1-\alpha^2})\\ &=\alpha^2 \sigma^2+(1-\alpha^2)\sigma^2 \\ &=\sigma^2. \end{align*} Hence, we have \begin{align*} \rho_{UZ} &=\frac{Cov(U,Z)}{\sqrt{Var(U)Var(Z)}}=\frac{\alpha\sigma^2}{\sqrt{\sigma^2\sigma^2}}=\alpha. \end{align*} ### Theorem B in p 143. We have the range for $\rho$: $-1\leq \rho \leq 1$. Furthermore, $\rho = \pm 1$ if and only if $P(Y=a+bX)=1$ for some constants $a$ and $b$. #### Prof. (The proof is tricky!) Because \begin{eqnarray*}0&\leq& Var(\frac{X}{\sigma_X}+\frac{Y}{\sigma_Y})\\ & = &(\frac{1}{\sigma_X})^2Var({X})+(\frac{1}{\sigma_Y})^2Var(Y)+2\frac{1}{\sigma_X}\frac{1}{\sigma_Y}Cov(X,Y)\\ & =& 1+1+2\rho = 2(1+\rho).\end{eqnarray*} Hence, $0\leq 2(1+\rho)$, this implies $-1\leq \rho$. If $\rho = -1$, $Var(\frac{X}{\sigma_X}+\frac{Y}{\sigma_Y}) = 0$, then $P(\frac{X}{\sigma_X}+\frac{Y}{\sigma_Y}=c)=1$ by Corollary in p 134. Similarly, we have \begin{eqnarray*}0&\leq& Var(\frac{X}{\sigma_X}-\frac{Y}{\sigma_Y})\\ & = & Var(\frac{X}{\sigma_X})+Var(\frac{Y}{\sigma_Y})-2\frac{1}{\sigma_X}\frac{1}{\sigma_Y}Cov(X,Y)\\ & =& 1+1-2\rho = 2(1-\rho).\end{eqnarray*} Hence, $0\leq 2(1-\rho)$, this implies $\rho \leq 1$. Combining these two results, we conclude $-1 \leq \rho \leq 1$. If $\rho = 1$, $Var(\frac{X}{\sigma_X}-\frac{Y}{\sigma_Y}) = 0$, then $P(\frac{X}{\sigma_X}-\frac{Y}{\sigma_Y}=c) = 1$ by Corollary in p 134. ### Readings Example D (p 142), Example E (p 143), Example F (p 145) ### Interpretations of correlation from [Wiki](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#/media/File:Correlation_examples2.svg) #### Correlation only measures linear relationship ![](https://i.imgur.com/TxbjHUY.png) ![](https://i.imgur.com/0BsxiOv.png) ## 4.4 Conditional Expectation and Prediction ### Conditional expectation In the discrete case, the conditional expectation of $Y$ given $X=x$ is $$E(Y|X=x) = \sum y p_{Y|X}(y|x).$$ In the continuous case, the conditional expectation of $Y$ given $X=x$ is $$E(Y|X=x) = \int_y y f_{Y|X}(y|x)dy.$$ ### Example. If $X$ and $Y$ has a joint pmf (in C3.5) s follows. Find $E[X|Y=1]$. #### Recall Recall that |$Y$ v.s. $X$ | 0 |1 | $p_Y(y)$| |---|---|---|---| |0 | 1/8 | 0 | 1/8| |1 | 2/8 | 1/8 | 3/8 | |2 | 1/8 | 2/8 | 3/8 | |3 | 0 | 1/8 | 1/8 | |$p_X(x)$ | 4/8 | 4/8 |1| Hence, we have $$p_{X|Y=1}(x) = \left\{\begin{array}{ll}2/3,& \mbox{for }x=0,\\ 1/3, & \mbox{for }x=1.\end{array}\right.$$ Thus, $$E[X|Y=1] = 0\times 2/3 + 1\times 1/3 = 1/3.$$ ### Theorem A. A low of total expectation. $$E(Y)= E[E(Y|X)].$$ #### Proof. \begin{eqnarray*} E[E[Y|X]] &=& E_X[E_Y[Y|X]] \\ &=& \sum_{x} E[Y|X=x] p_X(x)\\ &=& \sum_{x} \left[\sum_y y p_{Y|X}(y|x)\right] p_X(x)\\ &=& \sum_{x} \sum_y y p_{Y|X}(y|x) p_X(x)\\ &=& \sum_{x} \sum_y y p_{X,Y}(x,y)= E[Y]. \end{eqnarray*} ### Example. ![](https://i.imgur.com/lp6AORt.png) ### Example: Random sums Let $T = \sum_{i=1}^N X_i$, where $N$ is a random variable with finite expectation and the $X_i$ are random variables that are independent of $N$ and have the common mean $E(X)$. Find the expectation of $T$. #### Sol. Note that $E[T|N=n] = E[\sum_{i=1}^N X_i |N=n] = n E(X)$. Hence, we have $E[T|N] = NE(X)$. In addition, we have\begin{eqnarray*} E[T] &=& E[E[T|N]] = E_N[E_T[T|N]] \\ &=& E_N[ NE[X] ]\\ &=& E(N)E(X). \end{eqnarray*} ![](https://i.imgur.com/t4QaKJx.png) ### :apple: Readings: Theorem B. $Var(Y)= Var[E(Y|X)]+E[Var(Y|X)]$. (derivation skipped) ### Prediction 1 To minimize $MSE = E[(Y-c)^2]=Var(Y)+(E(Y)-c)^2$, the minimizer for $c$ is $E(Y)$. #### Solution Since $Var(Y)$ is a constant, we should choose $c = E(Y)$. ### Prediction 2 The minimizing function $h(X)$ to minimize $MSE = E\{[Y-h(X)]^2\}$ is $h(X)=E(Y|X)$. #### Solution We want to minimize $$E[Y-h(X)]^2 = E(E\left\{[Y-h(X)]^2|X\right\}).$$The outer expectation is with respect to $X$. For every $x$, to minimize $E\left\{[Y-h(X)]^2|X=x\right\}$, we choose $h(x) = E[Y|X=x]$. We thus have that the minimizing function of $f(X)$ is $E[Y|X]$. ### Example A (Reading: Example B in p 148). From Example B in p 148, if $X$ and $Y$ follow a bivariate normal distribution. Then, $$Y|X=x\sim N\left(\mu_Y+\rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X),\sigma^2_Y(1-\rho^2)\right). $$ As a result, we have $$E(Y|X) =\mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(X-\mu_X).$$ ![](https://i.imgur.com/7LGjPGX.png) ## 4.5 Moment generating function (mgf) 動差母函數 ### Why we need mgf? 1. Easier to find the distribution of $(X_1+X_2+\cdots+X_n)$: linear function of random variables: $X+Y$, $aX$? 2. Help to prove the Central limit theorem. ### Definition of moments - The $r$th moment is $E[X^r]$. - The $r$th central moment is $E[(X-E(X))^r]$. The variance is the second central moment, the skewness is the third central moment. - The moment generating function (mgf) of a random variable $X$ is $M(t) = E(e^{tX})$ if the expectation is defined. ### Example. Let $Z\sim N(0,1)$. Find its mgf. \begin{eqnarray*} M_Z(t) = E[e^{Zt} ] &=& \int_{-\infty}^\infty e^{zt} \frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}dz\\ &=& \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}}e^{-\frac{z^2 -2zt +t^2-t^2}{2}} dz\\ &=& \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}}e^{-\frac{(z-t)^2}{2}+\frac{t^2}{2}} dz\\ &=& e^{\frac{t^2}{2}}. \end{eqnarray*} ![](https://i.imgur.com/BjomLpF.png) ### Property A. If the moment-generating function exists for $t$ in an open interval containing zero, it uniquely determines the probability distribution. :::info This means that, if two random variables have the same mgfs, the are identically distributed. ::: ### Property B. If the moment-generating function exists in an open interval containing zero, then $M^{(r)}(0) = E(X^r)$. #### Solution Note that $$M'(t) = \frac{d}{dt}\int e^{tx}f(x)dx = \int xe^{tx}f(x)dx.$$Therefore, $M'(0)=\int x\times 1 \times f(x)dx = E(X).$ Similarly, $$M''(t) = \frac{d}{dt}\int x e^{tx}f(x)dx = \int x^2e^{tx}f(x)dx.$$Therefore, we have $M''(0)=\int x^2\times 1\times f(x)dx = E(X^2).$In general, we have $$M^{(r)}(t) =\int x^r e^{tx}f(x)dx,$$ Hence, $M^{(r)}(0) =E[X^r]$. ### Property C. If $X$ has the mgf $M_X(t)$ and $Y= a+bX$, then $Y$ has the mgf $M_Y(t) = e^{at}M_X(bt)$. #### Solution \begin{eqnarray*} M_Y(t)&=& E[e^{tY}] \\ &=& E[e^{t(a+bX)}]\\ &=& E[e^{ta}e^{tbX}]\\ &=&e^{ta}E[e^{btX}]\\ &=& e^{ta} M_X(bt). \end{eqnarray*} ### Example Let $Z\sim N(0,1)$.If $X=a+bZ$, find the mgf of $X$. #### Solution \begin{eqnarray*} M_X(t) &=& M_{a+bZ}(t)\\ &=& e^{at}M_Z(bt)\\ &=& e^{at}e^{(bt)^2/2}\\ &=& e^{at+b^2t^2/2}. \end{eqnarray*} Hence, we conclude that if $X\sim N(a, b^2)$, then $$M_X(t) = e^{at+b^2t^2/2}.$$ ### Property D. If $X$ and $Y$ are independent random variables with mgf's $M_X$ and $M_Y$ and $Z=X+Y$, then $M_Z(t)=M_X(t)M_Y(t)$ on the common interval where both $mgf's$ exist. #### Proof. $$M_Z(t) = E[e^{Zt}] = E[e^{(X+Y)t}] = E[e^{Xt}e^{Yt}] = E[e^{Xt}]E[e^{Yt}] = M_X(t)M_Y(t).$$ ### Example Find the distribution of $(X+Y)$ if $X \sim N(\mu_1,\sigma_1^2)$ and $Y\sim N(\mu_2,\sigma_2^2)$ and $X$ and $Y$ are independent by calculating its mgf. #### Solution \begin{eqnarray*} M_{X+Y}(t)&=& M_X(t)M_Y(t)\\ &=& e^{\mu_1 t+\sigma_1^2 t^2/2} e^{\mu_2 t + \sigma_2^2 t^2/2}\\ &=& e^{(\mu_1+\mu_2)t + (\sigma_1^2+\sigma_2^2)t^2/2}. \end{eqnarray*} Hence, we recognize that $X+Y \sim N((\mu_1+\mu_2), (\sigma_1^2+\sigma_2^2))$. ### :construction: Exercise C4.5: 89 Let $X_1$, ### :apple: Readings: Examples A, C, D, E, F from p 156. :o: Stop here! 2021/12/7.