補充：Conditional Distributions

# 補充：Conditional Distributions 寫這篇的時候生病中，腦袋不清楚，忘記很多內容已經在其他筆記寫過了，不過反正也很簡短，就當作是重點整理好了。不過 conditional distribution 的內容應該是沒有寫過，主要也是誤了講這個才回顧了一堆重複的內容，其他已經會了的話，可以跳著看這部分。兩篇和這篇內容有重疊的筆記，可以交互參考： - [A.2.2 Joint Distribution and Density Functions](https://hackmd.io/@pipibear/H1JnK9HEC) - [補充： Joint distribution functions](https://hackmd.io/@pipibear/rkrvIg2I0) # 背景知識：joint / marginal distribution ## discrete case ### joint pmf 雖然在筆記「[A.2.2 Joint Distribution and Density Functions](https://hackmd.io/@pipibear/H1JnK9HEC)」已經定義過了，但我們還是先來看個圖和例子回憶一下 joint pmf： ![image](https://hackmd.io/_uploads/HyYM69mvC.png) > 左圖的每個點都是一個 $(x,y) \in S_X \times S_Y$ > > 右圖的每個箭頭對應的是一個 $(x,y)$，以及長度為 outcome 為這個點的機率大小。 > > $A$ 是一個任意的 event，如果我們要算 $A$ 這個 event 發生的機率，就是把 $A$ 中的箭頭大小加總。舉個 joint pmf 的例子：假設我們現在有一個硬幣和一個骰子，我們將 random variable $X$ 定義為擲硬幣、$Y$ 定義為擲骰子，我們求 $X,Y$ 的 joint pmf，詳細如下： ![image](https://hackmd.io/_uploads/rknX96VDA.png) ### marginal pmf 先來定義當有兩個 random variables $X,Y$ defined on a ++discrete++ space 的情況下，單一一個 random variable $X$ 的 pmf （也稱作 marginal pmf of $X$） > 其實在筆記「[A.2.2 Joint Distribution and Density Functions](https://hackmd.io/@pipibear/H1JnK9HEC)」中已經定義過，但是我覺得我這邊使用 Hogg 的課本的這個定義寫得比較清楚。 >> 如果對 joint pmf 還不熟悉也可以參考此筆記。 :::info 假設 $X$ 和 $Y$ 具 joint pmf $f(x,y)$ with space $S$，則 $X$ 自己的 pmf，也稱作 marginal pmf of $X$，定義為： \begin{equation} f_X(x) = \sum_yf(x,y) = P(X=x) \qquad x \in S_x \end{equation} ::: > space $S = S_X \times S_Y$ >> $S_X$ 和 $S_Y$ 為 $X$ 和 $Y$ 各自的 support。 >> > $\rightarrow$ 也就是說 $S$ 中包含的是許多 pairs $(x,y)$，其中 $x \in S_X, \ y \in S_Y$ $Y$ 的 marginal pmf 同理。 ## continuous case 既然前面定義了 random variables 是 discrete 的情況，那麼當然也可以去定義 continuous 的情況。 ### joint pdf continuous case 的定義稍微複雜一點點，多了一些條件，但還是跟 discrete case 的定義大同小異：假設我們要去定義兩個 continuous random variables $X,Y$ 它們的 joint pdf $f(x,y)$，那麼因為在算 cdf 時我們一樣需要要求算 pdf 底下的體積，所以我們需要 $f(x,y)$ 是可積的。那除此之外，$f(x,y)$ 需要滿足以下的 properties： :::info 1. $f(x,y) \ge 0$，其中當 $(x,y)$ 不在 $X,Y$ 的 space (support) 中時，$f(x,y) = 0$。 2. $\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f(x,y)\,dx\,dy =1$ 3. $P[(X,Y) \in A] = \iint_A f(x,y)\,dx\,dy$，其中 $\{(X,Y) \in A\}$ 是一個 event defined in $XY$-plane。 ::: 第一點可以見下圖： ![image](https://hackmd.io/_uploads/Sk8fGCVvR.png) > 因為 $X,Y$ 的 support 都是介於 $-1$ 到 $1$，所以可以看到不管是 $x$ 軸還是 $y$ 軸，只有在這個範圍中才是立體的（黃色螢光筆框起來的部分），也就是 $f(x,y)>0$；其餘部分都是平的，也就是 $f(x,y)=0$。第三點的意思如下圖： ![image](https://hackmd.io/_uploads/HyrtsaEDC.png) > 左圖： $A$ 是被定義在 $XY$-plane 的一個 event。 > > 右圖：$P[(X,Y) \in A]$ 其實就是在算 $A$ 在 $XY$-plane 圍出來的那個區塊，bounded by $z=f(x,y)$ 的體積。 ### marginal pdf 其實定義和 discrete 類似，只是從 $\sum$ 換成 $\int$： :::info marginal pdf of $X$： \begin{equation} f_X(x) = \int_{-\infty}^{\infty}f(x,y)\,dy \qquad x \in S_X \end{equation} ::: 至於為什麼會這樣定義，為什麼求 $f_X(x)$ 要去對所有可能的 $y$ 積分，我之前不知道在哪裡看到一個說法，覺得講得很清楚： > integrating the joint pdf $f(x,y)$ over all possible values of the other random variable $Y$ "sums out" the influence of $Y$, leaving the pdf that describes $X$ alone. # Conditional distribution ## conditional pmf / pdf ### discrete case (conditional pmf) 假設 $X,Y$ 具 joint discrete distribution with pmf $=f(x,y)$ on space $S$，並且 $X,Y$ 各自的 marginal pmf 為 $f_X(x), f_Y(y)$ with spaces $S_X,S_Y$。假設我們有兩個 event： - event $A = \{X=x\}$ - event $B = \{Y=y\}$ 這樣的話，兩個 events 同時發生的情況就用 $A \ \cap \ B = \{X=x, Y=y\}$ 表示。因為兩個 events 同時發生的機率即是 joint probability： \begin{equation} P(A \cap B) = P(X=x, Y=y) = f(x,y) \end{equation} 並且 $B$ 自己發生的機率（即 $Y=y$ 單獨發生的機率）為 marginal pmf of $Y$： \begin{equation} P(B) = P(Y=y) = f_Y(y) > 0 \end{equation} > 為什麼我們知道 $f_Y(y) > 0$ 是因為我們假設了 $y \in S_Y$。因此，根據條件機率的原始定義，再加上上述結果，我們可以推導出： \begin{equation} P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{f(x,y)}{f_Y(y)} \end{equation} 下面我們就來正式再把定義寫一次： :::info $X$ 的 conditional probability mass function (conditional pmf), given that $Y=y$ 定義為： \begin{equation} g(x|y) = \frac{f(x,y)}{f_Y(y)} \qquad \text{provided that } f_Y(y) > 0 \end{equation} ::: ### continuous case (conditional pdf) 對於 continuous random variables，其實 conditional pdf 的定義和 discrete 的相同，那我們就換成 $Y$ 再寫一次：若 $X,Y$ 具 distribution of continuous type, with joint pdf $f(x,y)$ and marginal pdfs $f_X(x)$ and $f_Y(y)$ :::info 則 $Y$ 的 conditional pdf, given that $X=x$ 定義為： \begin{equation} h(y|x) = \frac{f(x,y)}{f_X(x)} \qquad \text{provided that } f_X(x) > 0 \end{equation} ::: ## conditional mean ### discrete case 若 $X,Y$ 為 jointly discrete random variables 則 conditional expectation of $X$, given that $Y=y \quad \forall y \ni p_Y(y)> 0$ 定義為： :::info \begin{equation} \begin{split} E[X|Y=y] &= \sum_x xP(X=x|Y=y) \\ &= \sum_x xp_{X|Y}(x|y) \end{split} \end{equation} ::: > 這裡的 notation 可能會有點讓人混淆，因為我原本只看了 Hogg 的課本，後來又臨時起意加上 Sheldon Ross 的內容，所以可能會有前後符號不一致的情形。因為我後來想想後者的 $p_{X|Y}(x|y)$ 似乎比較清楚，就不再修改了。 > > 此處的 $p_{X|Y}(x|y)$ 即上文中的 $g(x|y)$，也就是 conditional pmf of $X$，所以 $p_{X|Y}(x|y) = P(X=x|Y=y)$，因此兩個等號後才代表了相同的東西。 ### continuous case > 此處用的是 Hogg 課本的定義，所以沒有寫得那麼嚴謹。 :::info \begin{equation} E(Y|x) = \int_{-\infty}^{\infty} yh(y|x)\,dy \end{equation} ::: > 意思是 $X=x$ 的情況下，$Y$ 的 expected value。 > > 定義的意思也很清楚： > > 我們 sum over 每個介於 $-\infty$ 到 $\infty$ 的小小的 $y$（$\,dy$），把它的值（$y$）乘上 $X=x$ 的情況下，$Y$ 等於它的機率（$h(y|x)$）。 >> 其實就和我們原始的 expected value 定義的意義一樣。 #### 例子 ![image](https://hackmd.io/_uploads/S1jy8MSwR.png) ### 特性 :::warning conditional expectation 滿足所有 expectation 本來具有的特性。 ::: 例如： \begin{equation} E[g(x)| Y = y] = \begin{cases} \sum_x g(x)p_{X|Y}(x|y) \quad &\text{in discrete case} \\ \int_{-\infty}^{\infty} g(x)f_{X|Y}(x|y)\,dx \quad &\text{in continuous case} \\ \end{cases} \end{equation} > 可以想像以前在算 variance 時，如果是 discrete case，用到的 $E[X^2] = \sum_x x^2p(x)$。或是我們看過只有一個 random variable 時，expectation 是 linear 的，conditional expectation 也具 linearity： \begin{equation} E[\sum_{i=1}^n X_i | Y=y] = \sum_{i=1}^n E[X_i|Y=y] \end{equation} 實際上，為什麼這些事情會成立，是因為： :::warning 我們可以將 conditional expectation given $Y=y$ 想成：「一般的 expectation，只是是在一個只包含 $Y=y$ 的 outcomes 的 ++reduced sample space++。」 ::: ### 利用 conditional expectation 來計算 expectation 我們令 $E[X|Y]$ 為一個 random variable $Y$ 的 function，其中 $Y=y$ 時的值為 $E[X|Y=y]$。 $E[X|Y]$ 本身也是一個 random variable，所以我們也可以對他取 expectation，進而得到一個重要的結果。詳細說明如下圖： ![image](https://hackmd.io/_uploads/Hkt9RMrwC.png) 所以我們得到的最重要的結論就是： :::success \begin{equation} E[X] = E[E[X|Y]] \end{equation} ::: 將想法寫成數學式，更進一步的去定義 discrete 和 continuous case 下該如何透過 conditional expectation 來計算 $E[X]$： :::success \begin{equation} E[X] = \begin{cases} \sum_y E[X|Y = y]P(Y=y) \quad &\text{discrete case} \\ \int_y E[X|Y = y]f_Y(y) \,dy &\text{continuous case} \\ \end{cases} \end{equation} ::: 證明一下 discrete case： ![image](https://hackmd.io/_uploads/SkINQQrvC.png) ### 透過 conditioning 計算機率由上面的 $E[X]$ 在兩種 cases 的式子，我們其實可以再延伸，利用條件機率來計算一般的機率。首先我們令 $E$ 為一個任意的 event，並且定義 random variable $X$ 為： \begin{equation} X = \begin{cases} 1 \qquad &\text{若 $E$ 發生} \\ 0 &\text{若 $E$ 沒發生} \end{cases} \end{equation} 因此，根據定義 $E$ 發生的機率為 $X$ 的 expected value，即： \begin{equation} E[X] = P(E) \end{equation} 並且，如果再去任找一個 random variable $Y$，並將 $Y=y$ 的限制加在上面的式子上，等式也仍然成立： \begin{equation} E[X|Y=y] = P(E|Y=y) \qquad \text{for any random variable } Y \end{equation} 結合前面的內容，我們會得到下圖結果： - discrete： ![image](https://hackmd.io/_uploads/rJ3xu7rvR.png) - continuous： ![image](https://hackmd.io/_uploads/Hy0XuXrwC.png) 除此之外，在 discrete 的情況下，我們可以考慮其中一種特例：假設 $Y$ 是一個 discrete random variable，且 $Y$ 的值為 $y_1,...,y_n$ 的其中一種。我們可以把每一種可能的 $Y=y_i$ 訂成一個 event $F_i$，這樣一來我們就能得到下方結果： ![image](https://hackmd.io/_uploads/Bylm9mHDC.png) ## conditional variance :::info conditional variance： \begin{equation} \begin{split} Var(Y|x) &\equiv E[(X - E[X|Y])^2|Y]\\ &= E[Y^2|x] - [E(Y|x)]^2 \end{split} \end{equation} ::: 證明過程如下： ![image](https://hackmd.io/_uploads/SJRebJrvC.png) 我們可以發現，其實 conditional variance 和一般的 variance 定義是很類似的，只是所有的 $E[]$ 都變成 conditional 的。 Sheldon Ross 的課本原話講得很清楚： :::warning $Var(X|Y)$ is exactly analogous to the usual definition of variance, but now ++all expectations are conditional on the fact that $Y$ is known.++ ::: ### 用 conditional variance 來計算 variance 其實像上面的 $E[X]$ 可以利用 conditional mean 計算出來一樣， $Var(X)$ 也和 conditional variance 有關，使得我們可以利用這樣的關係來計算一般的 variance。公式和證明如下圖： ![image](https://hackmd.io/_uploads/r1_tJVBv0.png) # cdf ## joint cdf ### 原始定義其實和原本 cdf 的定義類似，只是推廣到同時考慮兩個 random variables（bivariate 的情況下）。先來 recall 只有一個 random variable 的 cdf： \begin{equation} F(x) = P(X \le x) \end{equation} 假設我們現在有兩個 random variables $X,Y$，則它們的 joint cdf ==$F_{X,Y}(x,y)$== 定義為： :::info \begin{equation} F_{X,Y}(x,y) = P(X \le x \ \cap \ Y \le y ) \end{equation} ::: 用原始定義，我們可以寫出 discrete / continuous case 下的形式。定義方式也和以前只有一個 random variable 時類似。 ### discrete 如果 $X,Y$ discrete，則它們的 cdf 為每個可能的點的機率和： :::info \begin{equation} F_{X,Y}(x,y) = \sum_{y' \le y}\sum_{x' \le x}p_{X,Y}(x',y') \end{equation} 其中 $p_{X,Y}(x',y')$ 為 joint pmf。 ::: ### continuous 如果 $X,Y$ continuous，同理，只是改成積分： :::info \begin{equation} F_{X,Y}(x,y) = \int_{-\infty}^y\int_{-\infty}^x f_{X,Y}(x',y') \,dx' \,dy' \end{equation} 其中 $f_{X,Y}(x',y')$ 為 joint pdf。 ::: 這個定義也告訴了我們，如果我們有 joint cdf，可以反過來求 joint pdf： :::info \begin{equation} f_{X,Y}(x,y) = \frac{\partial^2}{\partial y \partial x} F_{X,Y}(x,y) \end{equation} ::: ## marginal cdf :::info \begin{equation} F_{X}(x) = F_{X,Y}(x,\infty) \end{equation} ::: > 因為 $F_{X}(x) = P(X \le x)$，所以我們就將限制設為 $X \le x, Y \le \infty$ $F_{Y}(y)$ 同理。 # independence 先來個 independent 的原始定義： :::info $X,Y$：independent if $\quad \forall A \subseteq \mathbb{R}, \ B \subseteq \mathbb{R}$ \begin{equation} P(X \in A, Y \in B) = P(X \in A)P(Y \in B) \end{equation} ::: 我們來證 continuous case 的情況這個定義會變成什麼樣子，$X,Y$ continuous 時我們有定理： > discrete 同理但更簡單，所以我們只證 continuous 的。 :::success $X,Y$：independent $\quad \Leftrightarrow \quad f_{X,Y}(x,y) = f_X(x)f_Y(y)$ ::: Recall： ![image](https://hackmd.io/_uploads/HkHA8CVP0.png) 證明，先從左證到右： ![image](https://hackmd.io/_uploads/H15-v0NPR.png) > 其中最後一個藍色箭頭的說明，為什麼 $X,Y$ independent 就可以拆成那樣，詳細解釋如下： > ![image](https://hackmd.io/_uploads/SyNYPA4P0.png) 再由右證到左： ![image](https://hackmd.io/_uploads/rJITP0Ew0.png) ## 特性 :::success 若 random variables $X_1,X_2,...,X_N$：independent 則它們的 joint pdf / pmf 可以被分解成個別的 pdf / pmf 相乘，如下： \begin{equation} f_{X_1,...,X_N}(x_1,..,x_N) = \prod_{i=1}^N f_{X_i}(x_i) \end{equation} ::: # 參考資料 - Hogg,Tanis,Zimmerman_Probability and Statistical Inference, 9th ed(2015), p.127, 140, 146-149, 151-152 - Sheldon Ross, A first course in Probability, 9th ed, p.336-339, 348, 351-352 - [Purdue lecture ppt(ECE 302: Lecture 5.1 Joint PDF and CDF)](https://engineering.purdue.edu/ChanGroup/ECE302/files/Slide_5_01.pdf)