# 【統計學】02 隨機變數與機率分佈
[TOC]
---
### Definition
#### <span style="color:green">$\omega$ (sample point)</span>
統計實驗中的每個結果 (outcome) 都會對應到一個 sample point。
#### <span style="color:blue">Random variable (隨機變數)</span>
是一個實數函數,其**定義域為樣本空間**且值域為實數,即為隨機實驗 (random experiment) 的每一個結果 (outcome) 指定相對應的數字的函數或規則。
#### <span style="color:red">Probability distribution (機率分佈)</span>
藉由給定機率到隨機變數中每個可能值,描述隨機變數的「值」及其「相對應的機率」的表格、公式或圖。 必須滿足 ==$\sum^n_{i=0}P(X = i) = 1$==。
<br>
---
## Part 1. Discrete Random Variable (Discrete R.V. 離散隨機變數)
若隨機變數 $X$ 之值域為**離散集合**,例如 $X \in \{1, 2, 3, \cdots, n\}$,則 X 是離散隨機變數 $\implies$ 離散變數之可能值是「可數的」或是「個數無限但是可數」。
### 1.1 Probability Mass Function (PMF)
隨機變數中每個可能值的機率,<span style="color:blue">所有可能值的機率總和必須是一</span>。
#### 公式
==<span style="color:red">$f(x) = P(X = x)$</span>==
#### 特性
- $0 \le f(x) \le 1 \text{ for all } x$
- $\sum\limits_{\text{all }x} f(x) = 1$
<br>
### 1.2 Cumulative Distribution Function (CDF)
是一個右連續函數,可用來找出 $X$ 落在每一特定區間的機率。
#### 公式
==<span style="color:red">$F(x) = P(X \le x) = \sum\limits_{y \le x}f(y)$</span>==
#### 特性
- 若 $a < b$, 則 $F(a) \le F(b)$
- $\lim\limits_{t \to - \infty}F_x(t) = 0$
- $\lim\limits_{t \to \infty}F_x(t) = 1$
<br>
---
## Part 2. Continuous Random Variable (Continuous R.V. 連續隨機變數)
連續隨機變數的可能值個數為「無限」或「不可數」,值域為一個**區間**。
### 2.1 Probability Density Function (PDF)
描述這個隨機變量的輸出值,在某個確定的取值點附近的可能性的函數。<span style="color:blue">所有可能值的機率總和必須是一</span>。
#### 公式
==<span style="color:red">$P(a \le X \le b) = \int_a^b f(x) dx = F_x(b) - F_x(a)$</span><br>==
$\because$ <span style="color:red">density $f(a)$</span> is a measure of how likely that $X$ is near $a$ : $P($<span style="color:blue">$X \in (a, a + dx)$</span>$) =$ <span style="color:blue">$f(a)$</span>$dx$
#### 特性
For a continuous random variable $X$, **density function $f(x)$ is not a probability** but **the area under $f(x)$ represents the probaility**.
- $0 \le f(x) \text{ for all } x$. It is possible to have $f(x) > 1 \text{ as long as } \int^\infty_{-\infty}f(x)dx = 1$
- $\int^\infty_{-\infty}f(x)dx = 1$
- <span style="color:blue">$F(x) = P(X \le x) = \int^x_{-\infty}f(t)dt$</span> is <span style="color:green">comulative distribution function (cdf)</span> of $X$
<br>
### 2.2 Cumulative Distribution Function (CDF)
$f(x)$ is not a probability but $F(x)$ is a probability.
#### PDF $\iff$ CDF
==<span style="color:red">$F'(x) = f(x)$</span>== , Since :
$F'(x) = \lim\limits_{\triangle \to 0} \frac{F(x + \triangle) - F(x)}{\triangle} = \lim\limits_{\triangle \to 0}\frac{\int^{x+\triangle}_x f(y)dy}{\triangle}= \lim\limits_{\triangle \to 0}\frac{f(x) \cdot \triangle}{\triangle} = \lim\limits_{\triangle \to 0}f(x) = f(x)$
$\therefore$ For Continuous R.V., ==<span style="color:red">PDF 是 CDF 的微分; CDF 是 PDF 的積分</span>==
<br>
#### Find PDF (Density function of a continuous X)
1. 找到 F(x) $\to$ 也就是 CDF
2. 對 F(x) 積分 $\to \frac{d}{dx}F(x) \to$ 找到 PDF $f(x)$
#### 特性
<span style="color:blue">$P(a \le X \le b) = \int^b_a f(x)dx$</span>
$\because P(X = a) = P(a \le X \le a) = \int^a_a f(y)dy = 0$
$\therefore P(X \le a) = P((X < a) \cup (X = 1)) = P(X < a) + P(X = a) = P(X < a)$
==<span style="color:red">$P(a \le X \le b) = P(a < X \le b) = P(a < X < b) = P(a \le X < b) = F(b) - F(a)$</span>==
<br><br>
<img style="width:550px" src='https://images.slideplayer.com/35/10289899/slides/slide_3.jpg'><br>
<br>
---
## Part 3. Joint Probability Distributions (聯合機率分佈)
由兩個及以上隨機變數組成的隨機變數的機率分佈。
<br>
### 3.1 Discrete X and Y
#### <span style="color:blue">**Joint** **P**robability **M**ass **F**unction of $(X, Y)$</span>
- ==$f(x,y) = P(X = x, Y = y)$==
- $\sum_x\sum_yf(x,y) = 1$
<br>
#### <span style="color:red">Marginal Distribution Function of $X$</span>
- ==$f_X(x) = P(X = x) = \sum\limits_{y}f(x,y) \text{ for } \text{all possible value } x$==
- $\sum_x f_X(x) = 1$
<br>
#### <span style="color:green">Conditional Distribution Function of $Y$ given $X = x$</span>
- ==$f_{Y \mid X}(y \mid x) = P(Y = y \mid X = x) = \dfrac{P(Y = y, X = x)}{P(X = x)} = \dfrac{f(x, y)}{f_X(x)}$==
==$\text{ for } y = \text{ all possible value }$==
- For any $x$, $\sum_y f_{Y \mid X}(y \mid x) = 1$
- 在上述範例中,$Y$ is a variable while **$X$ is the value $x$ and is no longer a variable**
<br>
#### Other Define
- Interval
- $0 \le f(x, y) \le 1 \text{ for all possible } (x,y)$
- $0 \le f_X(x) \le 1 \text{ for all possible } x$
- $0 \le f_Y(y) \le 1 \text{ for all possible } y$
- $0 \le f_{Y \mid X}(y \mid x) \le 1 \text{ for all possible } y$ <span style="color:blue">$\text{ (for any x) }$</span>
- $0 \le f_{X \mid Y}(x \mid y) \le 1 \text{ for all possible } x$ <span style="color:blue">$\text{ (for any y) }$</span>
- Summation
- $\sum\limits_{x}\sum\limits_{y}f(x,y) = 1$
- $\sum\limits_{x}f_X(x) = 1$
- $\sum\limits_{y}f_Y(y) = 1$
- $\sum\limits_{y}f_{Y \mid X}(y \mid x) = 1$ <span style="color:blue">$\text{ (for any x) }$</span>
- $\sum\limits_{x}f_{X \mid Y}(y \mid y) = 1$ <span style="color:blue">$\text{ (for any y) }$</span>
- Region
- For a region $A$, $P((X,Y) \in A) = \mathop{\sum\sum}\limits_{(x,y)\ \in \ A}f(x,y)$
<br>
### 3.2 Continuous X and Y
#### <span style="color:blue">**Joint** **P**robability **D**ensity **F**unction of $(X, Y)$</span>
$f(x, y)$ is a density but not a probability.
<br>
#### <span style="color:red">Marginal Density Function of $X$</span>
Describe the behavior of $X$
- ==$f_X(x) = \int^{\infty}_{-\infty}f(x,y)dy$==
- Marginal density function of $X$ : 對 $y$ 微分
- Marginal density function of $Y$ : 對 $x$ 微分
<br>
#### <span style="color:green">Conditional Distribution Function of $Y$ given $X = x$</span>
- $f_{Y \mid X}(y \mid x) = \dfrac{f(x, y)}{f_X(x)}$
- $Y$ is a variable, while **$X$ is the value near $x$ and is no longer a variable**
<br>
#### Other Define
- Interval
- $0 \le f(x, y) \text{ for all possible } (x,y)$
- $0 \le f_X(x) \text{ for all possible } x$
- $0 \le f_Y(y) \text{ for all possible } y$
- $0 \le f_{Y \mid X}(y \mid x) \text{ for all possible } y$ <span style="color:blue">$\text{ (for any x) }$</span>
- $0 \le f_{X \mid Y}(x \mid y) \text{ for all possible } x$ <span style="color:blue">$\text{ (for any y) }$</span>
- Summation
- $\iint\limits_{-\infty}^{\infty}f(x,y)dxdy = 1$
- $\int_{-\infty}^{\infty}f_X(x)dx = 1$
- $\int_{-\infty}^{\infty}f_Y(y)dy = 1$
- $\int_{-\infty}^{\infty}f_{Y \mid X}(y \mid x)dy = 1$ <span style="color:blue">$\text{ (for any x) }$</span>
- $\int_{-\infty}^{\infty}f_{X \mid Y}(x \mid y)dx = 1$ <span style="color:blue">$\text{ (for any y) }$</span>
- Region
- For a region $A$, $P((X,Y) \in A) = \iint\limits_{(x,y)\ \in \ A}f(x,y)dxdy$
<br>
---
## Part 3. 獨立隨機變數
==$X$ and $Y$ are statistically <span style="color:red">independent</span> $\iff$ <span style="color:red">$f(x,y) = f_X(x) \cdot f_(y)$== for all possible $(x,y)$
$\because f_{Y\mid X}(y\mid x)$ does not depend on $x \to$ 運算結果是否有 $x$,有 : dependent; 無 : independent
$\because f_{X\mid Y}(x\mid y)$ does not depend on $y \to$ 運算結果是否有 $y$,有 : dependent; 無 : independent