# 【統計學】02 隨機變數與機率分佈 [TOC] --- ### Definition #### <span style="color:green">$\omega$ (sample point)</span> 統計實驗中的每個結果 (outcome) 都會對應到一個 sample point。 #### <span style="color:blue">Random variable (隨機變數)</span> 是一個實數函數,其**定義域為樣本空間**且值域為實數,即為隨機實驗 (random experiment) 的每一個結果 (outcome) 指定相對應的數字的函數或規則。 #### <span style="color:red">Probability distribution (機率分佈)</span> 藉由給定機率到隨機變數中每個可能值,描述隨機變數的「值」及其「相對應的機率」的表格、公式或圖。 必須滿足 ==$\sum^n_{i=0}P(X = i) = 1$==。 <br> --- ## Part 1. Discrete Random Variable (Discrete R.V. 離散隨機變數) 若隨機變數 $X$ 之值域為**離散集合**,例如 $X \in \{1, 2, 3, \cdots, n\}$,則 X 是離散隨機變數 $\implies$ 離散變數之可能值是「可數的」或是「個數無限但是可數」。 ### 1.1 Probability Mass Function (PMF) 隨機變數中每個可能值的機率,<span style="color:blue">所有可能值的機率總和必須是一</span>。 #### 公式 ==<span style="color:red">$f(x) = P(X = x)$</span>== #### 特性 - $0 \le f(x) \le 1 \text{ for all } x$ - $\sum\limits_{\text{all }x} f(x) = 1$ <br> ### 1.2 Cumulative Distribution Function (CDF) 是一個右連續函數,可用來找出 $X$ 落在每一特定區間的機率。 #### 公式 ==<span style="color:red">$F(x) = P(X \le x) = \sum\limits_{y \le x}f(y)$</span>== #### 特性 - 若 $a < b$, 則 $F(a) \le F(b)$ - $\lim\limits_{t \to - \infty}F_x(t) = 0$ - $\lim\limits_{t \to \infty}F_x(t) = 1$ <br> --- ## Part 2. Continuous Random Variable (Continuous R.V. 連續隨機變數) 連續隨機變數的可能值個數為「無限」或「不可數」,值域為一個**區間**。 ### 2.1 Probability Density Function (PDF) 描述這個隨機變量的輸出值,在某個確定的取值點附近的可能性的函數。<span style="color:blue">所有可能值的機率總和必須是一</span>。 #### 公式 ==<span style="color:red">$P(a \le X \le b) = \int_a^b f(x) dx = F_x(b) - F_x(a)$</span><br>== $\because$ <span style="color:red">density $f(a)$</span> is a measure of how likely that $X$ is near $a$ : $P($<span style="color:blue">$X \in (a, a + dx)$</span>$) =$ <span style="color:blue">$f(a)$</span>$dx$ #### 特性 For a continuous random variable $X$, **density function $f(x)$ is not a probability** but **the area under $f(x)$ represents the probaility**. - $0 \le f(x) \text{ for all } x$. It is possible to have $f(x) > 1 \text{ as long as } \int^\infty_{-\infty}f(x)dx = 1$ - $\int^\infty_{-\infty}f(x)dx = 1$ - <span style="color:blue">$F(x) = P(X \le x) = \int^x_{-\infty}f(t)dt$</span> is <span style="color:green">comulative distribution function (cdf)</span> of $X$ <br> ### 2.2 Cumulative Distribution Function (CDF) $f(x)$ is not a probability but $F(x)$ is a probability. #### PDF $\iff$ CDF ==<span style="color:red">$F'(x) = f(x)$</span>== , Since : $F'(x) = \lim\limits_{\triangle \to 0} \frac{F(x + \triangle) - F(x)}{\triangle} = \lim\limits_{\triangle \to 0}\frac{\int^{x+\triangle}_x f(y)dy}{\triangle}= \lim\limits_{\triangle \to 0}\frac{f(x) \cdot \triangle}{\triangle} = \lim\limits_{\triangle \to 0}f(x) = f(x)$ $\therefore$ For Continuous R.V., ==<span style="color:red">PDF 是 CDF 的微分; CDF 是 PDF 的積分</span>== <br> #### Find PDF (Density function of a continuous X) 1. 找到 F(x) $\to$ 也就是 CDF 2. 對 F(x) 積分 $\to \frac{d}{dx}F(x) \to$ 找到 PDF $f(x)$ #### 特性 <span style="color:blue">$P(a \le X \le b) = \int^b_a f(x)dx$</span> $\because P(X = a) = P(a \le X \le a) = \int^a_a f(y)dy = 0$ $\therefore P(X \le a) = P((X < a) \cup (X = 1)) = P(X < a) + P(X = a) = P(X < a)$ ==<span style="color:red">$P(a \le X \le b) = P(a < X \le b) = P(a < X < b) = P(a \le X < b) = F(b) - F(a)$</span>== <br><br> <img style="width:550px" src='https://images.slideplayer.com/35/10289899/slides/slide_3.jpg'><br> <br> --- ## Part 3. Joint Probability Distributions (聯合機率分佈) 由兩個及以上隨機變數組成的隨機變數的機率分佈。 <br> ### 3.1 Discrete X and Y #### <span style="color:blue">**Joint** **P**robability **M**ass **F**unction of $(X, Y)$</span> - ==$f(x,y) = P(X = x, Y = y)$== - $\sum_x\sum_yf(x,y) = 1$ <br> #### <span style="color:red">Marginal Distribution Function of $X$</span> - ==$f_X(x) = P(X = x) = \sum\limits_{y}f(x,y) \text{ for } \text{all possible value } x$== - $\sum_x f_X(x) = 1$ <br> #### <span style="color:green">Conditional Distribution Function of $Y$ given $X = x$</span> - ==$f_{Y \mid X}(y \mid x) = P(Y = y \mid X = x) = \dfrac{P(Y = y, X = x)}{P(X = x)} = \dfrac{f(x, y)}{f_X(x)}$== ==$\text{ for } y = \text{ all possible value }$== - For any $x$, $\sum_y f_{Y \mid X}(y \mid x) = 1$ - 在上述範例中,$Y$ is a variable while **$X$ is the value $x$ and is no longer a variable** <br> #### Other Define - Interval - $0 \le f(x, y) \le 1 \text{ for all possible } (x,y)$ - $0 \le f_X(x) \le 1 \text{ for all possible } x$ - $0 \le f_Y(y) \le 1 \text{ for all possible } y$ - $0 \le f_{Y \mid X}(y \mid x) \le 1 \text{ for all possible } y$ <span style="color:blue">$\text{ (for any x) }$</span> - $0 \le f_{X \mid Y}(x \mid y) \le 1 \text{ for all possible } x$ <span style="color:blue">$\text{ (for any y) }$</span> - Summation - $\sum\limits_{x}\sum\limits_{y}f(x,y) = 1$ - $\sum\limits_{x}f_X(x) = 1$ - $\sum\limits_{y}f_Y(y) = 1$ - $\sum\limits_{y}f_{Y \mid X}(y \mid x) = 1$ <span style="color:blue">$\text{ (for any x) }$</span> - $\sum\limits_{x}f_{X \mid Y}(y \mid y) = 1$ <span style="color:blue">$\text{ (for any y) }$</span> - Region - For a region $A$, $P((X,Y) \in A) = \mathop{\sum\sum}\limits_{(x,y)\ \in \ A}f(x,y)$ <br> ### 3.2 Continuous X and Y #### <span style="color:blue">**Joint** **P**robability **D**ensity **F**unction of $(X, Y)$</span> $f(x, y)$ is a density but not a probability. <br> #### <span style="color:red">Marginal Density Function of $X$</span> Describe the behavior of $X$ - ==$f_X(x) = \int^{\infty}_{-\infty}f(x,y)dy$== - Marginal density function of $X$ : 對 $y$ 微分 - Marginal density function of $Y$ : 對 $x$ 微分 <br> #### <span style="color:green">Conditional Distribution Function of $Y$ given $X = x$</span> - $f_{Y \mid X}(y \mid x) = \dfrac{f(x, y)}{f_X(x)}$ - $Y$ is a variable, while **$X$ is the value near $x$ and is no longer a variable** <br> #### Other Define - Interval - $0 \le f(x, y) \text{ for all possible } (x,y)$ - $0 \le f_X(x) \text{ for all possible } x$ - $0 \le f_Y(y) \text{ for all possible } y$ - $0 \le f_{Y \mid X}(y \mid x) \text{ for all possible } y$ <span style="color:blue">$\text{ (for any x) }$</span> - $0 \le f_{X \mid Y}(x \mid y) \text{ for all possible } x$ <span style="color:blue">$\text{ (for any y) }$</span> - Summation - $\iint\limits_{-\infty}^{\infty}f(x,y)dxdy = 1$ - $\int_{-\infty}^{\infty}f_X(x)dx = 1$ - $\int_{-\infty}^{\infty}f_Y(y)dy = 1$ - $\int_{-\infty}^{\infty}f_{Y \mid X}(y \mid x)dy = 1$ <span style="color:blue">$\text{ (for any x) }$</span> - $\int_{-\infty}^{\infty}f_{X \mid Y}(x \mid y)dx = 1$ <span style="color:blue">$\text{ (for any y) }$</span> - Region - For a region $A$, $P((X,Y) \in A) = \iint\limits_{(x,y)\ \in \ A}f(x,y)dxdy$ <br> --- ## Part 3. 獨立隨機變數 ==$X$ and $Y$ are statistically <span style="color:red">independent</span> $\iff$ <span style="color:red">$f(x,y) = f_X(x) \cdot f_(y)$== for all possible $(x,y)$ $\because f_{Y\mid X}(y\mid x)$ does not depend on $x \to$ 運算結果是否有 $x$,有 : dependent; 無 : independent $\because f_{X\mid Y}(x\mid y)$ does not depend on $y \to$ 運算結果是否有 $y$,有 : dependent; 無 : independent