補充：random functions associated with normal distributions

# 補充：random functions associated with normal distributions ## 背景在統計學上，通常我們會假設我們取的 sample 背後的 population 是 normally distributed $N(\mu,\sigma^2)$。因此，我們就想去估計實際上 $\mu$、$\sigma$ 的值到底是什麼，或是想去對這些 parameters 提出假說，進而去驗證。在這個過程中，我們常常會用到： - sample mean $\bar{x}$ - sample variance $s^2$ > - 關於這兩個名詞的介紹可參考 Appendix A.2 的「補充：sample」。 > - 課本的表示方式都是使用大寫的 $\bar{X}$ 和 $S^2$，但為求整體筆記的統一，只要是 sample 相關的符號我都用小寫表示（包含 $\bar{x}$、$s^2$、$n$⋯⋯），除非使用課本截圖，那證明就延續課本寫法以避免混淆。對這兩個常用到的統計資料，我們就想去知道它們的 distribution 和一些相關的 functions。 > Note： > > 在下方的證明中會用到一些 mgf 的技巧，如果對這方面不熟悉，可參考前面的筆記「[補充：moment generating function (mgf)](https://hackmd.io/@pipibear/HyWZSx2NR)」和「[補充：mgf technique](https://hackmd.io/@pipibear/Bk1JuX2VA)」 ## 定理與例子 ### Thm 5.5-1 ![image](https://hackmd.io/_uploads/rkjm88nNR.png) > 也就是說我們從同個 population draw $n$ 個 random variables，他們彼此之間互相獨立，每個 $X_i$ 有著自己的 mean $\mu_i$ 和 variance $\sigma^2_i$ >> 因為每個 $X_i$ 也是 random variable，所以也有自己的 distribution，且因為是取樣，所以不一定會和 population 有相同的 mean 和 variance >> >> $\rightarrow$ 例如 sample 剛好都取到極大值，那麼 sample 的 mean 就會大於 population 的 mean。 > > 如果我們將這些彼此獨立的 random variables 進行線性組合，它們的分佈仍然會是一個 normal distribution，且 mean 和 variance 滿足定理中的陳述。證明如下： ![IMG_C8DE701B764C-1](https://hackmd.io/_uploads/SJP0wcT4A.jpg) ### Cor 5.5-1 ![image](https://hackmd.io/_uploads/Skg0Kcp40.png) 證明如下： ![image](https://hackmd.io/_uploads/ryyzc5640.png) > - 另一種證法可參考筆記 $A.2$ 的「[補充：sample](https://hackmd.io/@pipibear/Hyn0QQ_H0)」。 ### Thm 5.5-2 ![image](https://hackmd.io/_uploads/ByG90LYUR.png) 關於這個定理的證明，我在同章節前面幾篇的筆記「A.3.6 Chi-Square Distribution」有證明過了，只是當時省略了 $\chi_N^2 -\chi_1^2 = \chi_{N-1}^2$ 這個式子為什麼成立的理由，剛好這裡有講到，就在這裡證明這部分： > 前面的證明有興趣請自行參考「[A.3.6 Chi-Square Distribution](https://hackmd.io/@pipibear/HJx5_jz80)」的最後部分內容。 ![image](https://hackmd.io/_uploads/HyVu1S5LA.png) 把等號左右的 $M_W(t)$ 和 $E[e^{tZ^2}]$ 用它們的 mgf 代換： ![image](https://hackmd.io/_uploads/Hk19JBcIA.png) 因此，綜合這個結果和 chi-square distribution 的一些相關內容，我們可以得到結論： > 關於 chi-square distribution 可參考本筆記同章前後兩篇筆記： > - [A.3.6 Chi-Square Distribution](https://hackmd.io/@pipibear/SJYdTbdIC) > - [補充：Chi-Square Distribution](https://hackmd.io/@pipibear/SJYdTbdIC) :::warning 如果 sampling from a normal distribution，則： \begin{equation} \begin{split} U &= \sum_{i=1}^n\frac{(X_i-\mu)^2}{\sigma} \quad \text{is} \ \chi^2_n\\ W &= \sum_{i=1}^n\frac{(X_i-\bar{X})^2}{\sigma} \quad \text{is} \ \chi^2_{n-1}\\ \end{split} \end{equation} $\rightarrow$ 也就是說，當 population mean $\mu$ 被換成 sample mean $\bar{X}$ 時，++一個 degree of freedom is lost++。 ::: > 至於為什麼從 $\mu$ 換成 $\bar{X}$ 就會失去一個 degree of freedom，可參考 $A.2$ 的筆記「[補充：Bessel's correction](https://hackmd.io/@pipibear/HyIO-uWUA)」。 ![image](https://hackmd.io/_uploads/Byfh6_qIR.png) 超級麻煩的證明如下： ![image](https://hackmd.io/_uploads/BJxdCdc8C.png) 接下來的步驟就是要去對 $g(z,u)$ 積分（課本的作法），但我算的時候嫌符號太多，就先利用 $Z$ 具 standard normal distribution ，先將 $Z$ 的 cdf 用 $\Phi()$ 表示、pdf 用 $\phi()$ 表示。因為我們的目標是要求 $t$ distribution 的 pdf，所以一樣先求 cdf 再微分求 pdf： ![image](https://hackmd.io/_uploads/S1gRyY5IA.png) 求好 cdf 以後再微分，對一個積分微分時可以把微分搬進去，轉變成對 $\Phi()$ 的偏微，後面的 $f_U(u)$ 非 $t$ 的函數所以不影響，就這樣一路算下去： ![image](https://hackmd.io/_uploads/HkCWxFqIR.png) > 綠色的部分是利用了右方藍色字說明的 gamma function 性質，我沒證只是直接拿來用而已。整理完一堆很亂的東西以後就會得到 $t$ distribution 的 pdf 了。 $\rightarrow$ 課本也說了，如果要計算一個具 $t$ distribution 的 random variable 的 probabilties，用計算機或電腦程式去算，所以我想上面的證明也看看參考就好。另外，一些 notation： :::info 如果一個 random variable ++$T$ 具 $t$ distribution with $r$ degrees of freedom++，則我們說： \begin{equation} T \ \text{is} \ t(r) \end{equation} 並且將 ++right-tail probabilities of size $\alpha$++ 用 $t_\alpha(r)$ 表示。 ::: > - <font color = "snake">left / right tail</font> 為 probability distribution 中，最極端（最左最右）的部分。 > > 在下圖的例子裡，$\alpha = 0.05$ ![image](https://hackmd.io/_uploads/BJKHMK98A.png) > 因為 $t$ distribution 是 continuous，所以我們可以看到它的 pdf 如上圖的藍色線。根據 pdf 的定義，在 pdf 底下的面積 $=1$，所以 $\alpha = 0.05$ 的意思就是最右邊這塊的面積佔 $1$ 裡的 $0.05$（也就是整個 distribution 的 $5\%$。） # 參考資料 - Hogg,Tanis,Zimmerman_Probability and Statistical Inference, 9th ed(2015), p.192-197 > Section 5.5 random functions associated with normal distributions