# 機率 - 1 (Set Thm ~ Entropy)
### Set Theory
#### Operation
* Mutually exclusive: $S_i\cap S_j=\phi,\ \forall i,j$ with $i\not =j$
* $\displaystyle\bigcup_{n=1}^{\infty}S_n=\{x:x\in S_n$ for some $n\in\mathbb{N}\}$
* $\displaystyle\bigcap_{n=1}^{\infty}S_n=\{x:x\in S_n$ for every $n\in\mathbb{N}\}$
* element appearing in infty many set
* idea:
* pick an element $x$ in infty many set
* $x\in\displaystyle\bigcup_{n=1}^{\infty}S_n$, and also $x\in\displaystyle\bigcup_{n=2}^{\infty}S_n$, $x\in\displaystyle\bigcup_{n=3}^{\infty}S_n$, ...
* $x\in\displaystyle\bigcap_{k=1}^{\infty}\bigcup_{n=k}^{\infty}S_n$
#### Countable union/intersection
* $\displaystyle\bigcap_{k=1}^{\infty}\bigcup_{n=k}^{\infty}S_n=\{x:x$ appears in infinitely many $S_n\}$
* $\displaystyle\bigcup_{k=1}^{\infty}\bigcap_{n=k}^{\infty}S_n=\{x:x$ appears in all except finitely many $S_n\}$
> **pf**
> 1. $\bigcap_{k=1}^{\infty}\bigcup_{n=k}^{\infty}S_n\subseteq A$
> Pick some $x\in\bigcap_{k=1}^{\infty}\bigcup_{n=k}^{\infty}S_n$. Then $\exists k\in\mathbb{N}$ s.t.
> $x\in\bigcup_{n=k}^{\infty}S_n\Rightarrow x\in S_k,S_{k+1},\dots\Rightarrow x\in A$
> 2. $A\subseteq\bigcap_{k=1}^{\infty}\bigcup_{n=k}^{\infty}S_n$
> Pick some $y\in A$. Then $\exists m\in\mathbb{N}$ s.t. $y\in\bigcap_{n=m}^{\infty}S_n\Rightarrow A\subseteq\bigcap_{k=1}^{\infty}\bigcup_{n=k}^{\infty}S_n$
### Axioms
1. $P(A)\ge 0$ for any event $A$ (non-negative)
2. $P(\Omega)=1$
3. $A_1, A_2,\dots$ is an infinite sequence of matually exclusive events, then $P(\displaystyle\bigcup_{i=1}^{\infty}A_i)=\sum_{i=1}^{\infty}P(A_i)$ (countable addivity)
#### Other property
* $A_1,\dots,A_n$ are disjoint events, then $P(\displaystyle\bigcup_{i=1}^{n}A_i)=\sum_{i=1}^{\infty}P(A_i)$
> **pf**
> Let's choose $A_i=\phi$ for $i>n$
> $P(\displaystyle\bigcup_{i=1}^{\infty}A_i)=\sum_{i=1}^{\infty}P(A_i)$ ... by Axiom #3
> $=\displaystyle\sum_{i=1}^{n}P(A_i)+\sum_{i=n+1}^{\infty}P(A_i)=\displaystyle\sum_{i=1}^{n}P(A_i)$
* **Thm** (Union Bound) For any events $A_1,A_2,\dots,A_N$, we have $P(\displaystyle\bigcup_{n=1}^{N}A_i)\le\sum_{n=1}^{N}P(A_i)$
> **pf**
> For n=2, $P(A_1\cup A_2)=P(A_1\cup(A_2-A_1))=P(A_2)+P(A_2-A_1)\le P(A_1)+P(A_2)$
> Assume the statement holds for all $N\le k$
> Show that $N=k+1$ hold:
$P(\displaystyle\bigcup_{n=1}^{k+1}A_n)=P((\bigcup_{n=1}^{k}A_n)\cup A_{k+1})\le P(\bigcup_{n=1}^{k}A_n)+P(A_{k+1})$ by step 1
$\displaystyle\le\sum_{n=1}^{k}P(A_n)+P(A_{k+1})\le\sum_{i=1}^{k+1}P(A_n)$ by step 2
### Continuity
* **Def** (Increasing) A sequence of events is increasing if $E_1\subseteq E_2\subseteq\cdots\subseteq E_n\subseteq E_{n+1}\subseteq\cdots$
* **Thm** (Continuity) For any increasing sequence of events $E_1,E_2,\cdots$, we have $\displaystyle\lim_{n\to\infty}P(E_n)=P(\lim_{n\to\infty}E_n)$
> **pf**
> Let $G_1=E_2,\ G_2=E_2-E_1,\ G_3=E_3-E_1,\dots$
> $\displaystyle P(\lim_{n\to\infty}E_n)=P(\bigcup_{n=1}^{\infty}G_n)=\sum_{n=1}^{\infty}P(G_n)=\lim_{k\to\infty}\sum_{n=1}^{k}P(G_n)=\lim_{k\to\infty}P(\bigcup_{n=1}^{k}G_n)$
> $=\displaystyle\lim_{k\to\infty}P(E_k)$
* **Thm** For any decreasing sequence of events $E_1,E_2,\cdots$, we have $\displaystyle\lim_{n\to\infty}P(E_n)=P(\lim_{n\to\infty}E_n)$
> **pf**
> Method 1: Same as above
> Method 2:
> Consider $E_1^{c},E_2^{c},\dots$, then $E_1^{c}\subseteq E_2^{c}\subseteq\cdots\subseteq E_n^{c}\subseteq E_{n+1}^{c}\subseteq\cdots$
> $\displaystyle\lim_{n\to\infty}P(E_n^c)=P(\lim_{n\to\infty}E_n^c)\Rightarrow\lim_{n\to\infty}(1-P(E_n))=1-P(\lim E_n)$
> $\displaystyle\Rightarrow\lim_{n\to\infty}P(E_n)=P(\lim_{n\to\infty}E_n)$
* **Recap** $\displaystyle\bigcap_{k=1}^{\infty}\bigcup_{n=k}^{\infty}S_n$ is decreasing, $\displaystyle\bigcup_{k=1}^{\infty}\bigcap_{n=k}^{\infty}S_n$ is increasing
### Conditional Probability
* **Thm** (Reduction of Sample Space) Let $\Omega$ be the sample space and $B$ be an event w/ $P(B)>0$. Then we have
1. $P(A|B)\ge 0$ for any event $A$
2. $P(\Omega|B)=1$
3. $A_1, A_2,\dots$ is an infinite sequence of matually exclusive events, then $P(\displaystyle\bigcup_{i=1}^{\infty}A_i|B)=\sum_{i=1}^{\infty}P(A_i|B)$
#### Three useful tools
* **Thm** (Multiplication Rule) $\displaystyle P(\bigcap_{i=1}^{n}A_i)=P(A_1)P(A_2|A_1)\cdots P(A_n|A_1\cap A_2\cap\cdots\cap A_{n-1})$
* **Thm** (Total Probability Thm) Let $A_1,A_2,\dots,A_n$ be an mutually exclusive events that form a partition of $\Omega$, and $P(A_i)>0\ \forall i$. Then, for any event $B$, we have
$P(B)=P(A_1\cap B)+\cdots+P(A_n\cap B)=P(A_1)P(B|A_1)+\cdots+P(A_n)P(B|A_n)$
* **Thm** (Bayes' Rule) Let $A_1,A_2,\dots,A_n$ be mutually exclusive events that form a partition of $\Omega$, and $P(A_i)>0,\ \forall i$. Then for any event B we have
$P(A_i|B)=\dfrac{P(A_i)P(B|A_i)}{P(B)}=\dfrac{P(A_i)P(B|A_i)}{P(A_1)P(B|A_1)+\cdots+P(A_n)P(B|A_n)}$
* $P(A_i|B)$: posterior
* $P(A_i)$: prior
* $P(B|A_i)$: likelihood
* $P(B)$: evidence
### Independence
* **Def** (Indep. of several events) Evemts $A_1,A_2,\dots,A_n$ are said to be independent if $\displaystyle P(\bigcap_{i\in S}A_i)=\prod_{i\in S}P(A_i),\ S\subseteq\{1,2,\dots,n\}$
#### Conditional Independence
* **Def** $A$ and $B$ are said to be conditionally independent given $C$ if $P(A\cap B|C)=P(A|C)P(B|C)$. Moreover, if $P(B\cap C)>0$, $P(A|B\cap C)=P(A|C)$
* **Def** $A_1, A_2,\dots ,A_n$ are said to be conditionally independent given $C$ if $P(\displaystyle\bigcup_{i\in S}A_i|C)=\prod_{i\in S}P(A_i|C)$ for every $S\subseteq \{1,2,\dots,n\}$
* **ex** Communication Over a Noisy Channel

* Q1. P(01 rcv | 01 sent)
* sol
Since "0 rcv" and "1 rcv" are indep.
P(01 rcv | 01 sent) = P(1-st 0 rcv | 01 sent)P(2-nd 1 rcv| 01 sent) = $(1-q_0)(1-q_1)$
* Q2. P(01 rcv and 01 sent)
* sol
=P(01 sent)P(01 rvc | 01 sent)=$p(1-p)(1-q_0)(1-q_1)$
* Q3. P(01 sent | 01 rcv)
* sol (Bayes' Rule)
$A_0$: 00 is sent
$A_1$: 01 is sent
$A_2$: 10 is sent
$A_3$: 11 is sent
$B$: 01 is rcv
$P(A_1|B)=\dfrac{P(A_1)P(B|A_1)}{P(A_0)P(B|A_0)+P(A_1)P(B|A_1)+P(A_2)P(B|A_2)+P(A_3)P(B|A_3)}$
$=\dfrac{p(1-p)(1-q_0)(1-q_1)}{p^2(1-q_0)q_0+p(1-p)(1-q_0)(1-q_1)+(1-p)pq_0q_1+(1-p)^2q_1(1-q_1)}$
### Random Variable
* Random Variable (r.v.): a function that maps each outcome to a real number
#### Cumulative Distribution Function (CDF)
* **Def** For any random variable $X$, the CDF is defined as: $F_X(t)=P(X\le t),\ \forall t\in\mathbb{R}$
| Event | Probability |
| ----- | ----------- |
| $X\le a$ | $F_X(a)$ |
| $X>a$ | $1-F_X(a)$ |
| $X<a$ | $F_X(a)-P(X=a)$ / $F_X(a^{-})$ |
| $X\ge a$ | $1-F_X(a^{-})$ |
| $X=a$ | $F_X(a)-F_X(a^{-})$ |
| $a<X\le b$ | $F_X(b)-F_X(a)$ |
| $a<X<b$ | $F_X(b^{-})-F_X(a)$ |
| $a\le X\le b$ | $F_X(b)-F_X(a^{-})$ |
| $a\le X<b$ | $F_X(b^{-})-F_X(a^{-})$ |
* Prop.
* $F_X(t)$ is non-decreasing
* $\displaystyle\lim_{t\to\infty}F_X(t)=1$, $\displaystyle\lim_{t\to-\infty}F_X(t)=0$
* $F_X(t)$ is right-continuous ($F_X(t^{+})=F_X(t)$)
#### Probbility Mass Function (PMF)
* Specify CDF of discrete random var.
* For any discrete random variable $X$ with possible values $\{x_1,x_2,\dots\}$, the PMF of is a function that satisfies:
* $p(x_i)=P(X=x_i)$
* $p(x)=0$ if $x\not\in\{x_1,x_2,\dots\}$
* $\displaystyle\sum_{i=1}^{\infty}p(x_i)=1$
* **ex** St. Petersburg Paradox

* PMF
$p(x)=\begin{cases}
(\frac{1}{2})^{k},\ x=-10000+2^{k},\ k\in\mathbb{N} \\
0,\ \text{otherwise} \\
\end{cases}$
* CDF
$F_X(t)=\begin{cases}
\frac{1}{2},\ t\le -10000+2^1 \\
\frac{3}{4},\ -10000+2^1<t\le -10000+2^2 \\
\cdots
\end{cases}$
#### Special Discrete Random Variable
##### Bernoulli Random Var.
* 1 experiment trial (no repetition) with 2 possible outcomes
* PMF: $P(X=k)=\begin{cases}
p,\ k=1 \\
1-p,\ k=0 \\
0,\ \text{otherwise}
\end{cases}$
* CDF: $F_X(t)=\begin{cases}
0,\ t<0 \\
1-p,\ 0\le t<1 \\
1,\ t\le 1
\end{cases}$
##### Binomial R.V.
* do $n$ repetitions of the same Bernoulli experiment and want how many successes in repetitions
* PMF: $P(X=k)=\begin{cases}
C^n_kp^k(1-p)^{n-k},\ k=0,1,2,\dots,n \\
0,\ \text{otherwise}
\end{cases}$
* denote: $X\sim \text{Binomial}(n,\ p)$
* **ex**

* $X\sim\text{Binomial(N,p)}$
$P(X=k)=\binom{N}{k}p^k(1-p)^{N-k}=:f(p)$
$\ln f(p)=\ln\binom{N}{k}+k\ln p+(N-k)\ln(1-p)$
$\dfrac{d}{dp}\ln f(p)=\dfrac{k}{p}-\dfrac{N-k}{1-p}=0\Leftrightarrow k(1-p)=p(N-k)\Leftrightarrow p=\dfrac{k}{N}$
> 其實直接微也算得出來啦 [name=Sean20405]
##### Poisson R.V.
* Average rate is known and static. Then how many occurrences in an observation window?
* 
* 在一段時間的每個時刻進行一次柏努利試驗,每次機率極低
* $\lambda$:這段時間試驗成功次數的期望值
* PMF: $P(X=n)=\dfrac{e^{-\lambda T}(\lambda T)^n}{n!}$
* $\lambda T$
* determine the peak of PMF
* is expected value
* ex.

* sol
* Define $X:=\{$# of car passing through in 30 min.$\}$
Define $Y:=\{$# of car passing through in 10 min.$\}$
$X\sim\text{Poisson}(\lambda T),\ Y\sim\text{Poisson}(\lambda T')$
$P(X\ge 1)=1-P(X=0)=1-e^{-\lambda T}=0.95\Rightarrow e^{-\lambda T}=0.05$
$P(Y\ge 1)=1-P(Y=0)=1-e^{-\lambda T'}=1-e^{-\lambda \frac{T}{3}}=1-\sqrt[3]{0.05}$
* ex. (Sum of Indep. Poisson R.V.)

$\begin{align}
P(Z=k)&=\displaystyle\sum_{m=0}^{k}P(N_1=m)P(N_2=k-m) \\
&=\displaystyle\sum_{m=0}^{k}\dfrac{e^{-\lambda_1T}(\lambda_1T)^m}{m!}\displaystyle\sum_{m=0}^{k}\dfrac{e^{-\lambda_2T}(\lambda_2T)^{k-m}}{(k-m)!} \\
&=e^{-\lambda_1T}e^{-\lambda_2T}\displaystyle\sum_{m=0}^{k}\dfrac{(\lambda_1T)^m(\lambda_2T)^{k-m}}{m!(k-m)!} \\
&=\dfrac{e^{-\lambda_1T}e^{-\lambda_2T}}{k!}\displaystyle\sum_{m=0}^{k}(\lambda_1T)^m(\lambda_2T)^{k-m}\dfrac{k!}{m!(k-m)!} \\
&=\dfrac{e^{-\lambda_1T}e^{-\lambda_2T}}{k!}\displaystyle\sum_{m=0}^{k}(\lambda_1T)^m(\lambda_2T)^{k-m}\dbinom{k}{m} \\
&=\dfrac{e^{-\lambda_1T}e^{-\lambda_2T}}{k!}(\lambda_1T+\lambda_2T)^k
\end{align}$
##### Geometric R.V.
* Doing Binomial R.V. and succeed until k-th trial
* PMF: $P(X=k)=(1-p)^{k-1}p$
* Memoryless Property
* $P(X=n+m|X>m)=\dfrac{P(X=n+m)}{X>m}=\dfrac{(1-p)^{n+m-1}\cdot p}{\sum_{k=m+1}^{\infty}(1-p)^{k-1}\cdot p}$
$=\dfrac{(1-p)^{n+m-1}\cdot p}{\frac{p(1-p)^m}{1-(1-p)}}=(1-p)^{n-1}\cdot p=P(X=n)$
##### Discrete Uniform R.V.
* PMF: $P(X=k)=\begin{cases}
\dfrac{1}{b-a+1},\ k=a,a+1,\dots,b \\
0,\ \text{otherwise}
\end{cases}$
### Entropy
* **Def** (Shannon Entropy) Given a random variable $X$ with possible values $\{x_1,x_2,\dots\}$ and PMF $p_{_X}(x)$, the Shannon entropy of $X$ is defined as $H(X):=-\displaystyle\sum_{i}p_{_X}(x_i)\log p_{_X}(x_i)$
