---
title: MS4
tags: teach:MS
---
# Chapter 4
[Back home](https://hackmd.io/myN1AJZMRxWuw6xw0VnTeQ)
### Key ideas in this chapter
- Definitions of expectation, variance, covariance.
- Calculation of expectation using its definition.
- Expectation is a linear operator. This property allows simpler calculations.
- Definitions of variance and covariance.
- Markov, Chebyshev inequalities.
- Moment generating functions.
## 4.1 The expected value of a random variable
### Definition
If $X$ is a discrete random variable with frequency function $p(x)$, the expected value (or simply expectation) of $X$, denoted by $E(X)$, is
$$E(X) = \sum_i x_i p(x_i),$$
provided that $\sum_i |x_i|p(x_i)<\infty$. If the sum diverges, the expectation is undefined.
### Example
Throw a fair die. If $x$ points appears, you receive $x$ dollars. What is the expected dollars you will receive?
#### Sol.
Let $X$ denote the dollar recieved. The frequency function of $X$:
$P(X=x)=1/6$, $x=1,2,3,4,5,6.$
$$E[X] = 1\times \frac{1}{6}+\cdots+ 6\times \frac{1}{6}=21/6 = 3.5.$$
You expected dollars recieved is 3.5.
### Example
Find the expected value of $X\sim Bernoulli(p)$.
#### Sol.
Because $P(X=0)=1-p$, $P(X=1)=p$, we have
$$E(X) = 1*p + 0(1-p) = p.$$
:dog: Homework in p 116: Examples A, B, C.
### Definition
If $X$ is a continuous random variable with density $f(x)$, then
$$E(X) = \int_{-\infty}^\infty xf(x)dx$$
provided that $\int |x|f(x)dx<\infty$. If the integral diverges, the expectation is undefined.
:dog: Homework in p 118: Examples E, F, G.
### Markov Inequality
If $X$ is a random variable with $P(X\geq 0)=1$ and for which $E(X)$ exists, then $$P(X\geq t)\leq \frac{E(X)}{t}.$$
#### Proof.
Note that
\begin{eqnarray*}
E(X) = \int_{-\infty}^\infty xf(x)dx & = & \int_0^t
xf(x)dx + \int_{t}^\infty xf(x)dx \\
& \geq & \int_0^t 0f(x)dx + \int_{t}^\infty t f(x)dx\\
& = & 0 + t\int_{t}^\infty f(x)dx\\
&=& tP(X\geq t).
\end{eqnarray*}
Hence, $P(X \geq t )\leq \frac{E(X)}{t}$.
### Example
$X\sim Exp(1)$, calculate $P(X>3)$ and estimate it using Markov Inequality?
#### Sol.
$P(X>3) = exp^{-3\times 1} = e^{-3} \approx 0.0498$
With Markov inequality,
$$P(X>3) \leq \frac{1}{3} \approx 0.33 $$
### Expectations of functions of random variable.
Suppose $Y=g(X)$.
- If $X$ is discrete with frequency function $p(x)$, then
$$E(Y) = \sum_x g(x)p(x)$$
provided that $\sum |g(x)|p(x) < \infty$.
- If $X$ is continuous with density function $f(x)$, then
$$E(Y) = \int_{-\infty}^{\infty} g(x)f(x)dx$$
provided that $\int |g(x)|f(x)dx < \infty$.
Suppose that $X_1,\ldots,X_n$ are jointly distributed random variables and $Y= g(X_1,\ldots,X_n)$.
- If the $X_i$ are discrete with frequency function $p(x_1,\ldots,x_n)$, then
$$E(Y) = \sum_{x_1,\ldots,x_n} g(x_1,\ldots,x_n)p(x_1,\ldots,x_n)$$
provided that $\sum |g(x_1,\ldots,x_n)|p(x_1,\ldots,x_n) < \infty$.
- If $X_i$ are continuous with joint density function $f(x_1,\ldots,x_n)$, then
$$E(Y) = \int_{-\infty}^{\infty}\cdots \int_{-\infty}^{\infty} g(x_1,\ldots,x_n)f(x_1,\ldots,x_n)dx_1\cdots dx_n$$
provided that $\int |g(x_1,\ldots,x_n)|f(x_1,\ldots,x_n)dx_1\cdots dx_n < \infty$.
### Corollary in page 124.
If $X$ and $Y$ are independent random variables and $g$ and $h$ are fixed functions, then $$E[g(X)h(Y)] = E[g(X)]E[h(Y)],$$
provided that the expectations on the right-hand side exist.
Proof. Without loss of generality, assume that $X$ and $Y$ are discrete with joint frequency, $p_{X,Y}(x,y)$, and $X$ and $Y$ have marginal frequencies $p_X(x)$ and $p_Y(y)$. Because $X$ and $Y$ are independent, $p_{X,Y}(x,y)=p_X(x)p_Y(y)$.
Therefore, we have
\begin{eqnarray*}
E(g(X)h(Y)) &=& \sum_y \sum_x g(x)h(y) p_{X,Y}(x,y)\\
&=& \sum_y \sum_x g(x)h(y) p_{X}(x)p_Y(y)\\
&=& \sum_y \left(\sum_x g(x)h(y) p_{X}(x)p_Y(y)\right)\\
&=& \sum_y h(y)p_Y(y) \left(\sum_x g(x) p_{X}(x)\right)\\
&=& \sum_y h(y)p_Y(y) E(g(X))\\
&=& E(h(Y)) E(g(X)).
\end{eqnarray*}
### Theorem in page 125.
If $X_1,\ldots,X_n$ are jointly distributed random variables with expectation $E(X_i)$ and $Y$ is a linear function of the $X_i$, $Y = a+\sum_{i=1}^n b_iX_i$, then
$$E(Y) = a + \sum_{i=1}^n b_i E(X_i).$$
Proof. Assume they are continuous wiht joint density $f(x_1,\ldots,x_n)$. Then, we have
\begin{eqnarray*}
E(Y) &=& \int\cdots\int (a+\sum b_i x_i)f(x_1,\ldots,x_n)dx_1\cdots dx_n\\
&=& a\int\cdots\int f(x_1,\ldots,x_n)dx_1\cdots dx_n\\
&&+ \sum b_i \int\cdots\int x_if(x_1,\ldots,x_n)dx_1\cdots dx_n\\
&=& a+\sum b_i E(X_i).
\end{eqnarray*}
### Linear Operator
An operator $L$ is said to be linear, if, for every pairs of functions $f$ and $g$, and a scalar $a$, we have
- $L(f+g) = L(f)+L(g)$,
- $L(a f)= aL(f)$.
The expectation is a linear operator:
- $E[X+Y]= E[X]+E[Y]$.
- $E[cX]=cE[X]$ for a scalar $c$.
- $E[a+X]=a+E[X]$ for a scalar $a$.
Exam 1 covers all materials above this line.
:sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower::sunflower:
Exam 2 covers all materials below.
## 4.2 Variance and Standard Deviation
### Variance
- If $X$ is a random variable with expected value $E(X)$, the variance of $X$ is
$$Var(X)= E\left[(X-E(X))^2\right],$$
provided that the expectation exists.
- The standard deviation of $X$ is the square root of the variance,
$$SD(X)=\sqrt{Var(X)}.$$
### Example
Find the variance of the Bernoulli distribution.
#### Sol.
If $X\sim Bernoulli(p)$, it has been known that $E(X)=p$. Thus, the variance of $X$ is
\begin{eqnarray*}
Var(X)& = & E(X-p)^2\\
& = & (1-p)^2 \times p + (0-p)^2\times (1-p)\\
& = & p(1-p)((1-p) + p) \\
&= & p(1-p).
\end{eqnarray*}
### :bear: Readings in p 132: Example B.
### :+1: Variance is not a linear operator
If $Var(X)$ exists and $Y = a+bX$, then $Var(Y) = b^2 Var(X)$.
#### Proof.
\begin{eqnarray*}
Var(Y) = E(Y-\mu_Y)^2 &=& E\left( (a+bX)-(a+b\mu_X)\right)^2\\
&=& E\left(b(X-\mu_X)\right)^2\\
&=& E(b^2 (X-\mu_X)^2)\\
&=& b^2 E(X-\mu_X)^2\\
&=& b^2 Var(X).
\end{eqnarray*}Therefore, variance is not a linear operator.
### :+1: Theorem in p 132.
The variance of $X$, if it exists, may also be calculated as follows:
$$Var(X) = E(X^2)-E(X)^2.$$
#### Proof
Let $\mu$ denote $E(X)$. This follows directly from
\begin{eqnarray*}
Var(X)& = &E(X-\mu)^2 \\
&= &E(X^2 - 2X \mu + \mu^2)\\
& =& E(X^2) - 2\mu E(X) + \mu^2\\
& =& E(X^2) - 2\mu^2 + \mu^2 \\
&=& E(X^2) -\mu^2.
\end{eqnarray*}
### :+1: Example
Find the variance of the uniform distribution, U(0,1).
#### Solution
Because $$E(X^2) = \int_0^1 x^2 \times 1 dx = \frac{1}{3},$$ and $$E(X)=\int_{0}^1 x\times 1 dx = \frac{1}{2},$$
we have
$$Var(X) = E(X^2)-E(X)^2 = \frac{1}{3}-\left(\frac{1}{2}\right)^2=\frac{1}{12}.$$
### :+1: Chebyshev's Inequality
Let $X$ be a random variable with mean $\mu$ and variance $\sigma^2$. Then, for any $t>0$,
$$P(|X-\mu|\geq t)\leq \frac{\sigma^2}{t^2}.$$
#### Proof
Let $Y=(X-\mu)^2$, then $P(Y\geq 0)=1$, and we can use Markov Inequality. In addition, it is easy to see that $$E(Y) =E[(X-\mu)^2] = \sigma^2. $$
By Markov inequality, we have
$$P(|X-\mu|\geq t)=P(|X-\mu|^2 \geq t^2) = P(Y\geq t^2)\leq \frac{E(Y)}{t^2}=\frac{\sigma^2}{t^2}. $$
### Corollary A (p 134)
If $Var(X)=0$, then $P(X=\mu)=1$.
#### Proof
Given $\varepsilon\geq 0$, then by Chebyshev's inequality,
$$P(|X-\mu|\geq \varepsilon)\leq \frac{\sigma^2}{\varepsilon^2}=0.$$
Therefore, we have $P(|X-\mu|\geq \varepsilon)=0$, and hence $P(X=\mu)=1$.
### Theorem A (p 136)
Let the mean squared error $MSE$ be $MSE = E[(X-x_0)^2].$ Then, $$MSE =\beta^2+\sigma^2,$$
where $\beta = E(X-x_0)$ is the bias between $X$ and $x_0$, and $\sigma^2$ is the variance of $X$.
#### Proof
Let $\mu=E(X)$, then $\beta =E(X-x_0)=\mu-x_0$. We have
\begin{eqnarray*}
MSE=E(X-x_0)^2 &=& E\left((X-\mu)+(\mu-x_0)\right)^2\\
&=& E((X-\mu)^2 +2(X-\mu)(\mu-x_0)+(\mu-x_0)^2)\\
&=& E(X-\mu)^2 + 2(\mu-x_0)E(X-\mu)+(\mu-x_0)^2\\
&=& \sigma^2 + 0 + (\mu-x_0)^2\\
&=& \sigma^2 + \beta^2.
\end{eqnarray*}
### :apple: Exercise C4.2: 48
## 4.3 Covariance and Correlation
### Definition in p 138.
If $X$ and $Y$ are jointly distributed random variables with expectations $\mu_X$ and $\mu_Y$, respectively, the covariance of $X$ and $Y$ is
$$Cov(X,Y) = E[(X-\mu_X)(Y-\mu_Y)],$$
provided that the expectation exists.
### A useful formula for calculating the covariance
Show that
$$Cov(X,Y) = E(XY)-\mu_X \mu_Y.$$
#### Proof
\begin{eqnarray*}
Cov(X,Y)&=& E((X-\mu_X)(Y-\mu_Y))\\
&=& E(XY - \mu_X Y - \mu_Y X +\mu_X\mu_Y)\\
&=& E(XY) -\mu_X EY -\mu_Y EX + \mu_X \mu_Y \\
&=& E(XY)- \mu_X\mu_Y.
\end{eqnarray*}
### Example
The joint density of $X$ and $Y$ is $f(x,y)=2x+2y-4xy$, where $0\leq x\leq 1$ and $0\leq y \leq 1$. Find the covariance and correlation of $X$ and $Y$.
#### Solution
To find the covariance, note $Cov(X,Y) = E(XY)-E(X)E(Y)$. Hence, we first calculate $E(XY)$.
\begin{eqnarray*}
E(XY) &=& \int_0^1 \int_0^1 xy(2x+2y-4xy)dxdy\\
&=& \int_0^1 \int_0^1 (2x^2y +2xy^2 -4x^2y^2)dxdy\\
&=& \int_{0}^1 \left[ \frac{2}{3}x^3y + x^2y^2 -\frac{4}{3}x^3y^2\right]_{x=0}^{x=1} dy\\
&=& \int_{0}^1 \frac{2}{3}y +y^2 - \frac{4}{3}y^2 dy \\
&=& \int_{0}^1 \frac{2}{3}y - \frac{1}{3}y^2 dy \\
&=& \left[\frac{1}{3}y^2 - \frac{1}{9}y^3\right]^{y=1}_{y=0}= \frac{2}{9}.
\end{eqnarray*}
Now, to calculate $E(X)$, we find the marginal density of $X$:
\begin{eqnarray*}
f_X(x) &=& \int_{0}^1 (2x+2y-4xy) dy \\
&=& \left[2xy + y^2 - 2xy^2\right]_{y=0}^{y=1}\\
&=& 2x +1 -2x = 1,\quad \mbox{for}\quad 0<x<1.
\end{eqnarray*}Hence, $X\sim Unif(0,1)$, and $E(X)=\frac{1}{2}$ and $Var(X) = \frac{1}{12}$.
Similarly, $Y$'s marginal pdf is:
\begin{eqnarray*}
f_Y(y) &=& \int_{0}^1 (2x+2y-4xy) dx \\
&=& \left[x^2 + 2xy - 2x^2y\right]_{x=0}^{x=1}\\
&=& 1 +2y -2y = 1,\quad \mbox{for}\quad 0<y<1.
\end{eqnarray*}Hence, $Y\sim Unif(0,1)$, and $E(Y)=\frac{1}{2}$ and $Var(Y) = \frac{1}{12}$.
As a result, we have
$$Cov(X,Y) = E(XY) - E(X)E(Y) = \frac{2}{9}-\frac{1}{2}\times\frac{1}{2} = -\frac{1}{36}.$$
The correlation is
$$Corr(X,Y)=\frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}=\frac{-\frac{-1}{36}}{\frac{1}{12}} = -\frac{1}{3}.$$
### A generalization of the covariance formula
Let $W,X,Y,Z$ be random variables and $a,b,c,d$ be scalars. Then, we have
\begin{eqnarray*}
&&Cov(aW+bX, cY+dZ)\\
&=&E(aW+bX)(cY+dZ)-E(aW+bX)E(cY+dZ)\\
&=&E(acWY+adWZ+bcXY+bdXZ)-(aE(W)+bE(X))(cE(Y)+dE(Z))\\
&=&ac(E(WY)-E(W)E(Y))+bc(E(XY)-E(X)E(Y))+ad(E(WZ)-E(W)E(Z)) + bd(E(XZ)-E(X)E(Z))\\
&=& ac Cov(W,Y)+bcCov(X,Y)+adCov(W,Z)+bdCov(X,Z).
\end{eqnarray*}
### Exercise 4.3.44
If $X$ and $Y$ are independent random variables with equal variances, find $Cov(X+Y, X-Y)$.
#### Solution
$$Cov(X+Y, X-Y) = Cov(X,X)-Cov(X,Y)+Cov(Y,X)-Cov(Y,Y)=Var(X)-Var(Y) =0. $$
### Theorem A (p 140)
Suppose $U = a+\sum_{i=1}^n b_i X_i$ and $V = c+\sum_{j=1}^m d_j Y_j$. Then
$$Cov(U,V) = \sum_{i=1}^n \sum_{j=1}^m b_id_j Cov(X_i,Y_j).$$
#### Solution
By the linearity of the expectation, it is known that
\begin{eqnarray*}
E(U) & = & a+ \sum_{i=1}^n b_i \mu_{X_i},\\
E(V) & = & c+\sum_{j=1}^m d_j\mu_{Y_j}.
\end{eqnarray*}
\begin{eqnarray*}
Cov(U,V) & = & E\left[(U-E(U) )(V-E(V))\right]\\
&=& E\left[ \left(\sum_{i=1}^n b_i(X_i-\mu_{X_i}) \right)\left(\sum_{j=1}^m d_j(Y_j-\mu_{Y_j})\right)\right]\\
& = & E\left[ \sum_{i=1}^n\sum_{j=1}^m b_id_j(X_i-\mu_{X_i})(Y_j-\mu_{Y_j})\right]\\
&=&
\sum_{i=1}^n\sum_{j=1}^m b_id_j E\left[(X_i-\mu_{X_i})(Y_j-\mu_{Y_j})\right]\\
&=& \sum_{i=1}^n\sum_{j=1}^m b_id_j Cov(X_i,Y_j).
\end{eqnarray*}
### Corollary A.
Show that
$$Var(a+\sum_{i=1}^n b_iX_i) = \sum_{i=1}^n \sum_{j=1}^n b_i b_j Cov(X_i, X_j).$$
#### Proof.
\begin{eqnarray*}
Var(a+\sum_{i=1}^n b_iX_i) & = &Cov(a+\sum_{i=1}^n b_iX_i, a+\sum_{i=1}^n b_iX_i)\\
&=& \sum_{i=1}^n\sum_{j=1}^n b_i b_j Cov(X_i,X_j)\\
&=& \sum_{i=1}^n b_i^2 Var(X_i)+2\sum_{i<j }b_i b_j Cov(X_i,X_j).
\end{eqnarray*}
### A simplification of the variance formula
$Var(\sum_{i=1}^n X_i) = \sum_{i=1}^{n}Var(X_i)$, if the $X_i$ are independent.
#### Solution
Note if $X$ and $Y$ are independent, $$Cov(X,Y)=E(X,Y)-E(X)E(Y) = E(X)E(Y)-E(X)E(Y)=0.$$
Therefore, if $X_i$'s are independent,
\begin{eqnarray*}
Var(\sum_{i=1}^n X_i) &=& \sum_{i=1}^n Var(X_i)+2\sum_{i<j }Cov(X_i,X_j)\\
&=& \sum_{i=1}^n Var(X_i).
\end{eqnarray*}
### Example
Find the variance of a binomial random variable.
#### Solution
Because when $Y\sim Binomial(n,p)$, $Y=X_1+\cdots+X_n$, where $X_i\stackrel{i.i.d.}{\sim}Bernoulli(p)$. Therefore, we have
$$Var(Y) = \sum_{i=1}^n Var(X_i) = n p(1-p).$$
### Readings in p 140: Example c.
### Definition
If $X$ and $Y$ are jointly distributed random variables and the variances and covariances of both $X$ and $Y$ exist and the variances are nonzero, then the correlation of $X$ and $Y$, denoted by $\rho$, is
$$\rho =\frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}.$$
### Exercise C4.3: 46
Let $U$ and $V$ be independent random variables with means $\mu$ and variances $\sigma^2$. Let $Z=\alpha U+ V\sqrt{1-\alpha^2}$. Find $E(Z)$ and $\rho_{UZ}$.
#### Solution
By linearity of the expectation,
$$E(Z)=\alpha \mu + \mu\sqrt{1-\alpha^2}. $$
To find $\rho_{UZ}$, we first calculate
\begin{align*}
E(UZ)&=E\left(U(\alpha U + V\sqrt{1-\alpha^2})\right)\\
& = \alpha E(U^2) + \sqrt{1-\alpha^2} E(UV)\\
& = \alpha( \sigma^2 +\mu^2)+\sqrt{1-\alpha^2}\mu^2\\
& =\mu^2 (\alpha+\sqrt{1-\alpha^2} )+\alpha\sigma^2.
\end{align*}
Hence,
\begin{align*}
Cov(U,Z) &= E(UZ)-E(U)E(Z)\\
&= \mu^2(\alpha+\sqrt{1-\alpha^2})+\alpha\sigma^2 -\mu(\alpha \mu + \mu\sqrt{1-\alpha^2})\\
&=\alpha\sigma^2˙
\end{align*}
In addition, we have
\begin{align*}
Var(Z) & =Var(\alpha U+ V\sqrt{1-\alpha^2})\\
&=\alpha^2 \sigma^2+(1-\alpha^2)\sigma^2 \\
&=\sigma^2.
\end{align*}
Hence, we have
\begin{align*}
\rho_{UZ} &=\frac{Cov(U,Z)}{\sqrt{Var(U)Var(Z)}}=\frac{\alpha\sigma^2}{\sqrt{\sigma^2\sigma^2}}=\alpha.
\end{align*}
### Theorem B in p 143.
We have the range for $\rho$: $-1\leq \rho \leq 1$. Furthermore, $\rho = \pm 1$ if and only if $P(Y=a+bX)=1$ for some constants $a$ and $b$.
#### Prof. (The proof is tricky!) Because
\begin{eqnarray*}0&\leq& Var(\frac{X}{\sigma_X}+\frac{Y}{\sigma_Y})\\
& = &(\frac{1}{\sigma_X})^2Var({X})+(\frac{1}{\sigma_Y})^2Var(Y)+2\frac{1}{\sigma_X}\frac{1}{\sigma_Y}Cov(X,Y)\\
& =& 1+1+2\rho = 2(1+\rho).\end{eqnarray*}
Hence, $0\leq 2(1+\rho)$, this implies $-1\leq \rho$.
If $\rho = -1$, $Var(\frac{X}{\sigma_X}+\frac{Y}{\sigma_Y}) = 0$, then $P(\frac{X}{\sigma_X}+\frac{Y}{\sigma_Y}=c)=1$ by Corollary in p 134.
Similarly, we have
\begin{eqnarray*}0&\leq& Var(\frac{X}{\sigma_X}-\frac{Y}{\sigma_Y})\\
& = & Var(\frac{X}{\sigma_X})+Var(\frac{Y}{\sigma_Y})-2\frac{1}{\sigma_X}\frac{1}{\sigma_Y}Cov(X,Y)\\
& =& 1+1-2\rho = 2(1-\rho).\end{eqnarray*}
Hence, $0\leq 2(1-\rho)$, this implies $\rho \leq 1$.
Combining these two results, we conclude $-1 \leq \rho \leq 1$.
If $\rho = 1$, $Var(\frac{X}{\sigma_X}-\frac{Y}{\sigma_Y}) = 0$, then $P(\frac{X}{\sigma_X}-\frac{Y}{\sigma_Y}=c) = 1$ by Corollary in p 134.
### Readings
Example D (p 142), Example E (p 143), Example F (p 145)
### Interpretations of correlation from [Wiki](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#/media/File:Correlation_examples2.svg)
#### Correlation only measures linear relationship


## 4.4 Conditional Expectation and Prediction
### Conditional expectation
In the discrete case, the conditional expectation of $Y$ given $X=x$ is
$$E(Y|X=x) = \sum y p_{Y|X}(y|x).$$
In the continuous case, the conditional expectation of $Y$ given $X=x$ is
$$E(Y|X=x) = \int_y y f_{Y|X}(y|x)dy.$$
### Example.
If $X$ and $Y$ has a joint pmf (in C3.5) s follows. Find $E[X|Y=1]$.
#### Recall
Recall that
|$Y$ v.s. $X$ | 0 |1 | $p_Y(y)$|
|---|---|---|---|
|0 | 1/8 | 0 | 1/8|
|1 | 2/8 | 1/8 | 3/8 |
|2 | 1/8 | 2/8 | 3/8 |
|3 | 0 | 1/8 | 1/8 |
|$p_X(x)$ | 4/8 | 4/8 |1|
Hence, we have
$$p_{X|Y=1}(x) = \left\{\begin{array}{ll}2/3,& \mbox{for }x=0,\\
1/3, & \mbox{for }x=1.\end{array}\right.$$
Thus,
$$E[X|Y=1] = 0\times 2/3 + 1\times 1/3 = 1/3.$$
### Theorem A. A low of total expectation.
$$E(Y)= E[E(Y|X)].$$
#### Proof.
\begin{eqnarray*}
E[E[Y|X]] &=& E_X[E_Y[Y|X]] \\
&=& \sum_{x} E[Y|X=x] p_X(x)\\
&=& \sum_{x} \left[\sum_y y p_{Y|X}(y|x)\right] p_X(x)\\
&=& \sum_{x} \sum_y y p_{Y|X}(y|x) p_X(x)\\
&=& \sum_{x} \sum_y y p_{X,Y}(x,y)= E[Y].
\end{eqnarray*}
### Example.

### Example: Random sums
Let $T = \sum_{i=1}^N X_i$, where $N$ is a random variable with finite expectation and the $X_i$ are random variables that are independent of $N$ and have the common mean $E(X)$. Find the expectation of $T$.
#### Sol.
Note that $E[T|N=n] = E[\sum_{i=1}^N X_i
|N=n] = n E(X)$. Hence, we have $E[T|N] = NE(X)$. In addition, we have\begin{eqnarray*}
E[T] &=& E[E[T|N]] = E_N[E_T[T|N]] \\
&=& E_N[ NE[X] ]\\
&=& E(N)E(X).
\end{eqnarray*}

### :apple: Readings: Theorem B.
$Var(Y)= Var[E(Y|X)]+E[Var(Y|X)]$. (derivation skipped)
### Prediction 1
To minimize $MSE = E[(Y-c)^2]=Var(Y)+(E(Y)-c)^2$, the minimizer for $c$ is $E(Y)$.
#### Solution
Since $Var(Y)$ is a constant, we should choose $c = E(Y)$.
### Prediction 2
The minimizing function $h(X)$ to minimize $MSE = E\{[Y-h(X)]^2\}$ is $h(X)=E(Y|X)$.
#### Solution
We want to minimize
$$E[Y-h(X)]^2 = E(E\left\{[Y-h(X)]^2|X\right\}).$$The outer expectation is with respect to $X$. For every $x$, to minimize $E\left\{[Y-h(X)]^2|X=x\right\}$, we choose $h(x) = E[Y|X=x]$. We thus have that the minimizing function of $f(X)$ is $E[Y|X]$.
### Example A
(Reading: Example B in p 148). From Example B in p 148, if $X$ and $Y$ follow a bivariate normal distribution. Then,
$$Y|X=x\sim N\left(\mu_Y+\rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X),\sigma^2_Y(1-\rho^2)\right). $$
As a result, we have
$$E(Y|X) =\mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(X-\mu_X).$$

## 4.5 Moment generating function (mgf) 動差母函數
### Why we need mgf?
1. Easier to find the distribution of $(X_1+X_2+\cdots+X_n)$: linear function of random variables: $X+Y$, $aX$?
2. Help to prove the Central limit theorem.
### Definition of moments
- The $r$th moment is $E[X^r]$.
- The $r$th central moment is $E[(X-E(X))^r]$. The variance is the second central moment, the skewness is the third central moment.
- The moment generating function (mgf) of a random variable $X$ is $M(t) = E(e^{tX})$ if the expectation is defined.
### Example.
Let $Z\sim N(0,1)$. Find its mgf.
\begin{eqnarray*}
M_Z(t) = E[e^{Zt} ] &=& \int_{-\infty}^\infty e^{zt} \frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}dz\\
&=& \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}}e^{-\frac{z^2 -2zt +t^2-t^2}{2}} dz\\
&=& \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}}e^{-\frac{(z-t)^2}{2}+\frac{t^2}{2}} dz\\
&=& e^{\frac{t^2}{2}}.
\end{eqnarray*}

### Property A.
If the moment-generating function exists for $t$ in an open interval containing zero, it uniquely determines the probability distribution.
:::info
This means that, if two random variables have the same mgfs, the are identically distributed.
:::
### Property B.
If the moment-generating function exists in an open interval containing zero, then $M^{(r)}(0) = E(X^r)$.
#### Solution
Note that
$$M'(t) = \frac{d}{dt}\int e^{tx}f(x)dx = \int xe^{tx}f(x)dx.$$Therefore, $M'(0)=\int x\times 1 \times f(x)dx = E(X).$
Similarly,
$$M''(t) = \frac{d}{dt}\int x e^{tx}f(x)dx = \int x^2e^{tx}f(x)dx.$$Therefore, we have $M''(0)=\int x^2\times 1\times f(x)dx = E(X^2).$In general, we have
$$M^{(r)}(t) =\int x^r e^{tx}f(x)dx,$$
Hence, $M^{(r)}(0) =E[X^r]$.
### Property C.
If $X$ has the mgf $M_X(t)$ and $Y= a+bX$, then $Y$ has the mgf $M_Y(t) = e^{at}M_X(bt)$.
#### Solution
\begin{eqnarray*}
M_Y(t)&=& E[e^{tY}] \\
&=& E[e^{t(a+bX)}]\\
&=& E[e^{ta}e^{tbX}]\\
&=&e^{ta}E[e^{btX}]\\
&=& e^{ta} M_X(bt).
\end{eqnarray*}
### Example
Let $Z\sim N(0,1)$.If $X=a+bZ$, find the mgf of $X$.
#### Solution
\begin{eqnarray*}
M_X(t) &=& M_{a+bZ}(t)\\
&=& e^{at}M_Z(bt)\\
&=& e^{at}e^{(bt)^2/2}\\
&=& e^{at+b^2t^2/2}.
\end{eqnarray*}
Hence, we conclude that if $X\sim N(a, b^2)$, then
$$M_X(t) = e^{at+b^2t^2/2}.$$
### Property D.
If $X$ and $Y$ are independent random variables with mgf's $M_X$ and $M_Y$ and $Z=X+Y$, then $M_Z(t)=M_X(t)M_Y(t)$ on the common interval where both $mgf's$ exist.
#### Proof.
$$M_Z(t) = E[e^{Zt}] = E[e^{(X+Y)t}] = E[e^{Xt}e^{Yt}] = E[e^{Xt}]E[e^{Yt}] = M_X(t)M_Y(t).$$
### Example
Find the distribution of $(X+Y)$ if $X \sim N(\mu_1,\sigma_1^2)$ and $Y\sim N(\mu_2,\sigma_2^2)$ and $X$ and $Y$ are independent by calculating its mgf.
#### Solution
\begin{eqnarray*}
M_{X+Y}(t)&=& M_X(t)M_Y(t)\\
&=& e^{\mu_1 t+\sigma_1^2 t^2/2} e^{\mu_2 t + \sigma_2^2 t^2/2}\\
&=& e^{(\mu_1+\mu_2)t + (\sigma_1^2+\sigma_2^2)t^2/2}.
\end{eqnarray*}
Hence, we recognize that $X+Y \sim N((\mu_1+\mu_2), (\sigma_1^2+\sigma_2^2))$.
### :construction: Exercise C4.5: 89
Let $X_1$,
### :apple: Readings:
Examples A, C, D, E, F from p 156.
:o: Stop here! 2021/12/7.