# EE 126 MT2 Notes
* MT1 Notes: https://hackmd.io/@matthewtang/126mt1
* MT2 Notes: https://hackmd.io/@matthewtang/126mt2
# Lecture 9: 2/18/20
## Weak Law of Large numbers
* $M_n = \frac{1}{n} \sum\limits_{i=1}^n X_i$
* $X_i$ has mean $\mu$, var $\sigma^2$
* $E[M_n] = \frac{1}{n}E[X_i]n = \mu$
* $var(M_n) = \frac{1}{n^2} \sum\limits_{i=1}^n var(X_i) = \frac{\sigma^2}{n}$
* As $n \to \infty \colon E[M_n] = \mu, var(M_n) = 0$
* For iid $X_1, X_2, \ldots X_n$ with mean $\mu$, $\epsilon > 0$, $M_n = \frac{X_1 + \ldots X_n}{n}$
* $P(|M_n - \mu| \geq \epsilon) = P(|\frac{X_1 + \ldots X_n}{n} - \mu| \geq \epsilon) \to 0$ as $n \to \infty$
* Chevyshev bound provides $\frac{\sigma^2}{n \epsilon^2}$ which $\to 0$ as $n \to \infty$
* $P(|M_n - \mu| \geq \epsilon) \leq \delta$, $n > n_0 (\epsilon, \delta)$
* Confidence: $\delta$, accuracy: $\epsilon$
* $M_n$ converges in probability to $\mu$
* Beyond $n > n_0$, P(beyond $\mu \pm \epsilon) \leq \delta$
### WLLN Example
* $X_1, X_2 \ldots X_n \sim U[-1,1]$
* $F_{Y_n} = F_X(ny) \implies f_{X_n}(y) = nf_X(ny)$
* Ex 2
* $P(|Y_n| \geq \epsilon) = 0$ if $n > \frac{1}{\epsilon}$
* $Y_n = \min(X_1, X_2 \ldots X_n)$
* $P(|Y_n - 0| > \epsilon) = P(|X_1| > \epsilon, \ldots |X_n| > \epsilon) = (1 - \epsilon)^n \to 0$ as $n \to \infty$
### Packet arrivals
* $Y_n = 1$ if arrival at time n
* $I_k = \{2^k, 2^k+1, \ldots 2^{k+1}-1\}$
* 1 arrival in each interval $I_k$, equally likely anywhere in interval
* $P(Y_n = 1) = \frac{1}{2^k}$
* $\lim\limits_{n \to \infty} P(Y_n = 1) = \lim\limits_{k \to \infty} \frac{1}{2^k} = 0$
* $Y_n$ converges to 0 in probability
* For any finite n, certain to be infinite num of arrivals after n
## Convolution CLT
* $S_n = X_1 + X_2 + \ldots X_n$
* Convolving 2 squares creates triangle
* Convolving square and triangle creates curve
* CLT says that infinite sample size become gaussian
* Problem is that mean and var explode to infinity
* Normalize so $E[S_n] = n\mu, var(s_n) = n \sigma^2$
* Define $\hat{S}_n = \frac{S_n - n \mu}{\sqrt{n} \sigma}$
* 0 mean, 1 variance
* CLT says $\lim P(\hat{S_n} \leq x) = \Phi(x), \forall x$
* $\Phi(x) = \int\limits_{-\infty}^x e^{-t^2/2} dt$
* $\hat{S_n} \to \mathcal{N}(0,1)$
* $S_n \to\mathcal{N}(n \mu, n \sigma^2)$
* $M_n \to \mathcal{N}(\mu, \sigma^2/n)$
# Lecture 10: 2/25/20
## Strong Law of Large Numbers
* WLLN: $\lim\limits_{n \to \infty} P(|M_n - \mu| \geq \epsilon) = 0$
* $P( \lim\limits_{n \to \infty} M_n = \mu) = 1$ (Convergence almost surely)
* Proof of CLT
* $X_1, \ldots X_n$ iid RV mean $\mu, \sigma^2$
* $S_n = \sum\limits_{i=1}^nX_i \rightarrow E[S_n] = n\mu, var(S_n) = n\sigma^2$
* $M_n = \frac{1}{n} \sum\limits_{i=1}^nX_i \rightarrow E[S_n] = \mu, var(S_n) = \frac{\sigma^2}{n}$
* $\hat{S_n} = \dfrac{\sum\limits_{i=1}^nX_i - n\mu}{\sigma\sqrt{n}} \rightarrow E[\hat{S_n}] = 0, var(\hat{S_n}) = 1$
* If $Y \sim N(0,1) \rightarrow M_Y(s) = e^{s^2/2}$
* $\log M_y(s) = s^2/2$
* $M_{Z_n}(s) = E[e^{sZn}] = E[e^{s/\sqrt{n}X_1} \ldots]$
* $= E[e^{s/ \sqrt{n} X}]^n$
* $M_{Z_n}(s) = [M_X(\frac{s}{\sqrt{n}})]^n$
* $\lim\limits_{n \to \infty} M_{Z_n}(s) = n \log M_X(\frac{s}{\sqrt{n}})$
* Change of variables $y = \frac{1}{\sqrt{n}}$ Take the derivative wrt y and eval y to 0
* $\lim\limits_{y \to 0} M_x''(s)s^2/2$
### Polling Example
* Ask n randomly iid sampled voters $X_i = 1$ if yes
* Want to have 95% confidence that $|M_n - p| < \epsilon$, $\epsilon$ is accuracy of poll
* Let P = true prob, $M_n$ is empirical estimate
* Chebyshev: $P(|M_n - p| \geq a) \leq \frac{var(M_n)}{a^2}$
* $var(M_n) = p(1-p) \leq \frac{1}{4}$
* $P(|M_n - p| \geq 0.1) \leq \frac{1/n}{0.1^2} = \frac{25}{n}$
* Want $\frac{25}{n} \leq 0.05 \rightarrow n \geq 500$
* CLT:
* Want $P(|M_n - p| \geq 0.1) \leq 0.05$
* $P(\frac{|M_n - p|}{\frac{1}{2\sqrt{n}} } \geq \frac{0.1}{\frac{1}{2\sqrt{n}}}) \leq 0.05$
* Becomes $\sim N(0,1) > 0.2\sqrt{n} \leq 0.05$
* $0.2 \sqrt{n} \leq 2$
# Lecture 11: 2/27/20
## Entropy
* $H(X) = E[\log\frac{1}{P(X)}]$
* Design channel encoder and decoder to combat noise
* 2 problems: Source coding (how to best compress source)
* Channel coding
* Source coding:
* N iid RVs $X_i \ldots X_N$ each having entropy H(X) can be compressed into no more than $N(H(X) + \epsilon)$ bits, $\forall \epsilon > 0, n \to \infty$
* Conversely, compression to $< NH(X)$ bit is impossible without loss of information
* $H(X) = -\sum\limits_{x \in \mathcal{X}} P(X) \log(p(x)) = E[-\log(P(x))]$
* H(X) measure the uncertainty/surprise/info content in X
* Ex. $X \sim Bern(p)$
* $h(p) = p \log \frac{1}{p} + (1-p)(\log \frac{1}{1-p})$
* Binary entropy function
* $\log \frac{1}{P(x)}$ is the self-information of X=x
* $H(X,Y) = E[\log \frac{1}{P(X,Y)}]$ if x,y indep
* $H(X,Y) = \sum\limits_x \sum\limits_y P(x,y) \log(\frac{1}{P(x,y)})$
* $H(X,Y) = \sum\limits_x \sum\limits_y P(x)P(y) \log(\frac{1}{P(x)} \log\frac{1}{P(y)})$
* $H(X,Y) = H(X) + H(Y)$
## AEP (Asymptotic Equipartition Property)
* Flip a coin with bias p n times indep. What is a typical sequence?
* A typical sequence has np Heads and $n(1-p)$ tails
* $p^{np} (1-p)^{n(1-p)}$
* Let $x=2^{\log_2x}$
* $=2^{np \log p} 2^{n(1-p)\log(1-p)}$
* $= 2^{n(p \log p + (1-p) \log(1-p))}$
* $= 2^{-nh(p)}$
* How many typical sequences are there?
* $\binom{n}{np} \approx 2^{n h(p)}$
* Sterling approximation: $n! \approx (\frac{n}{e})^n$
* If $p=0.11$, $2^{nh(p)} = 2^{n/2}$
* Thm:If $X_1 \ldots X_n$ are iid $\sim P(X)$ then $- \frac{1}{n} \log p(X_1, X_2 \ldots X_n) \rightarrow H(X)$
* Functions of RVs are RVs, so $\log P(X_i)$ are iid
* By WLLN demands that $- \frac{1}{n} \sum\limits_{i=1}^n \log p(X_i) = - E[\log P(X)] = H(X)$
## Channel Capacity
* BSC(p): Binary Symmetric Channel
* Probability $p$ opposite bit, $1-p$ accurate
* BEC(p): Binary Erasure Channel
* Probability $p$ of error, $1-p$ accurate
* Tells you $(*)$ or $(e)$
* Goal: Find capacity of $BEC(p)$ channel
* Channel Capacity:
* $\boxed{m} \rightarrow^{f_n} \boxed{T_x} \rightarrow x^{(n)} \boxed{\sum} y^{(n)} \rightarrow^{g_n} \boxed{R_X} \rightarrow \boxed{\hat{m}}$
* If $m = \hat{m}$, reliable
* Capacity: Max rate of reliable communication
* Rate = $\frac{L_n}{n} = R$ bits/ch. use
* $R = \frac{L_n}{n}$
* Want $P_{err}^{(n)} = \max P(\hat{m} \neq m)$
* $C_{BEC} = 1-p$ bits/channel use
* R is achievable for the channel if for every positive n that is long enough, there exists enc and dec functions $f_n, g_n$ st $P_{err}^{(n)} \to 0, n \to \infty$
* Largest achievable R is called the capacity of a channel
* $C_{BEC} = 1-p$ bits/ channel use
* Assume you know before transmission which bits will be erased
* By SLLN P(np bits gets erased) goes to 1
* $C_{BEC} \leq 1-p$
# Lecture 12: 3/3/20
* Proof of $C_{BEC} \leq 1-p$
* Converse: If we know which bits will be erased, the max rate of reliable transmission is $(1-p)$ bits/ch use
* Achievabilty: Show that $R = 1-p-\epsilon, \forall \epsilon > 0$ is achievable
* Leverage SLLN: As $n \to \infty, P$ (erases np symbols) $\to 1$
* Codebook: Make a $(2^L \times n)$ lookup table with iid $B(\frac{1}{2})$ entries
* Codebook $C_i$: codeword of $i^{th}$ message
* Assume WLOG the last $np$ bits are erased
* $C_P'$: truncated codebook: $n(1-p)$ columns survive, $np$ bits are lost
* Decoding rule: If $c_j'$ is only entry in $C_p'$ matching $Y^{(n)}$ exactly, declare $\hat{m}_j$
* Otherwise: declare failure
* $P(error) = P(c_1'$ is not unique) $=P( \cup_{i=2}^n{c_i = c_1})$
* By union bound: $\leq \sum\limits_{i \neq1} 2^{-n(1-p)} = 2^{L_n} 2^{-n(1-p)}$
* $= 2^{nR}2^{-n(1-p)}$
* $= 2^{nR}2^{n(R-(1-p))}$
* When $R < 1-p$: $P_e \to 0$ as $n \to \infty$
* Make $R = 1-p - \epsilon \implies P_e \leq 2^{-n\epsilon} \to 0$
### BEC Example
* $n=10,100; p=0.5; \epsilon = 0.01$
* $C_{BEC} = \frac{1}{2}$ bit/ch use = $5,000$
* $L = 10000(1 - 0.5 - 0.01) = 4,900$
* $P_e \leq 2^{-n \epsilon} \leq 2^{-100} \approx 0$
## Other Schemes
* $C_{BSC}(p) = 1- h(p)$
* With binary function $h(p) = -p\log(p) - (1-p)\log(1-p)$
* General Discrete Memoryless Channel (DMC)
* $\boxed{X} \rightarrow \boxed{P(Y|X)} \rightarrow \boxed{Y}$
* $C = \max\limits_{P(X)} I(X;Y) = \max\limits_{P(X)}[H(X) - H(X|Y)]$
* $P(X)$ is a design choice for optimization
* $R \leq C$
## Huffman Encoding
* Given $X \in \{A,0.4;B,0.35;C,0.2;D,0.05\}$
* Encode into binary stream
* $H(X) = 1.74$ bits/symbol
* Take the 2 least probable elements and merge them
* Avg num of bits/symbol: $0.4(1) + 0.35(2) + 0.25(3) = 1.85$ bits/s
* Less efficient bc integer constraint
# Lecture 13: 3/5/20
## Random Processes
* A stochastic process $X = \{X_t\}_{t \in T}$
* Collection of RVs where index t is time
* Discrete time random process when t is discrete
* $X$ models evolution of a sequence of RVs
* To fully characterize behavior of DTRP, need joint PMF of $\{X_0, X_1 \ldots X_n\}$
* Markov Structure: DTMC let $\mathcal{X}$ be a finite set called state space
* Markov Property: $P(X_n = x_n | X_{n-1}=x_{n-1}\ldots X_0=x_0) = P(X_n = x_n | X_{n-1}=x_{n-1})$
* Given present, future is independent of past
* $X_n$ conditionally independent on past states given $X_{n-1}$
### Example
```graphviz
digraph G {
forcelabels=true;
graph [nodesep="0.5", ranksep="1"];
"0":nw -> "0"[label=" 0.8"];
"2" -> "0"[label=" 0.8", constraint=false];
"2" -> "1"[label=" 0.2", constraint=false];
"1" -> "2"[label=" 0.7", constraint=false];
"1":nw -> "1":ne[label=" 0.3", constraint=false];
"0" -> "1"[label=" 0.2", constraint=false];
}
```
* State-transition matrix:
* Rows sum to 1 (row-stochastic matrix)
* from V, to-->
* $\begin{bmatrix}
0.8 & 0.2 & 0\\
0 & 0.3 & 0.7\\
0.8 & 0.2 & 0
\end{bmatrix}$
* Initial dist: $\pi_0 = \begin{bmatrix} \pi_0(0) & \pi_0(1) & \pi_0(2)\end{bmatrix}$
* Want to find $\pi_n$
* Start with $n=1, P(X_1=j)$
* $P(X_1=j) =\sum \limits_{i=0}^2 P(X_1=j|X_0=i)P(X_0=i)$
* $\pi_i(j) =\sum \limits_{i=0}^2 P_{ij}\pi_0(i)$
* $\begin{bmatrix} \pi_1(0) & \pi_1(1) & \pi_1(2)\end{bmatrix} = \begin{bmatrix} \pi_0(0) & \pi_0(1) & \pi_0(2)\end{bmatrix}\begin{bmatrix}
P_{00} & P_{01} & P_{02}\\
P_{10} & P_{11} & P_{12}\\
P_{20} & P_{21} & P_{22}\\
\end{bmatrix}$
* $\pi_1 = \pi_0 \begin{bmatrix}
P_{00} & P_{01} & P_{02}\\
P_{10} & P_{11} & P_{12}\\
P_{20} & P_{21} & P_{22}\\
\end{bmatrix}$
* $\pi_n = \pi_0 P^n$
* Invariant distribution (stationary MC): $\pi = \pi P$
* Flow in = flow out for each state at equilibrium
* State 0: $0.6z = 0.2x$
* State 1: $0.2x+.4z = .7y$
* State 2: $0.7y=z$
* Total: $x+y+z=1$
Irreducible: If it can go from any state to any state
```graphviz
digraph G {
forcelabels=true;
graph [nodesep="0.5", ranksep="1"];
"0" -> "1"[label=" 1.0", constraint=false];
"1" -> "2"[label=" 0.4", constraint=false];
"2" -> "1"[label=" 1.0", constraint=false];
"1" -> "0"[label=" 0.6", constraint=false];
}
```
```graphviz
digraph G {
forcelabels=true;
graph [nodesep="0.5", ranksep="1"];
"0" -> "1"[label=" 1.0", constraint=false];
"1" -> "2"[label=" 0.4", constraint=false];
"2" -> "1"[label=" 0.9", constraint=false];
"1" -> "0"[label=" 0.6", constraint=false];
"2" -> "2"[label=" 0.1", constraint=false];
}
```
```graphviz
digraph G {
forcelabels=true;
graph [nodesep="0.5", ranksep="1"];
"0" -> "1"[label=" 1.0", constraint=false];
"1" -> "2"[label=" 0.4", constraint=false];
"1" -> "0"[label=" 0.6", constraint=false];
"2" -> "2"[label=" 1.0", constraint=false];
}
```
* A,B irreducible, C not irreducible
* Period: $d(i) = gcd \{ n \geq 1 | P^n (i,i) > 0\}$
* $d(i) = d, \forall i \in \mathcal{X}$
* aperiodic: $d=1$ (Any self loop has period 1)
* a: period 2
* b: period 1
## General Questions
* Does there exist an invariant dist?
* Yes, if num of states is finite
* Is the invariant dist unique?
* Yes if finite, irreducible
* Doesm$\lim\limits_{n \to \infty} \pi_n = \pi$?
* Yes if irreducible and aperiodic
* Big Theorem: If MC is finite and irreducible, it has a unique inv. dist $\pi$ and $\pi(i)$ is the long term fraction of time that $x_n$ is equal to $i$
* If MC is also periodic, $\lim\limits_{n \to \infty} \pi_n = \pi$
### First passage time/hitting time
* Starting at state A, how many steps to get to state E: $T_E$
* $\beta_E(A) \triangleq E[T_E | X_0 =A]$
* Key is to calculate $\beta(i)$ for $i=A,B,C,D,E$, $\beta(i) = E[T_E|X_0=i]$
* $\beta(A) = 1 + \frac{1}{2}\beta(B) + \frac{1}{2}\beta(D)$
* $\beta(B) = 1 + \beta(C)$
* $\beta(C) = 1 + \beta(A)$
* $\beta(D) = 1 \frac{1}{3}\beta(A) + \frac{1}{3} \beta(B) + \frac{1}{3} \beta(E)$
* $\beta(E) = 0$
* Solve sys of equations: $17, 19, 18, 13 ,0$
# Lecture 14: 3/10/20
* Ex. Expected num of tosses till 2 consec Heads
```graphviz
digraph G {
forcelabels=true;
graph [nodesep="0.5", ranksep="1"];
"0":w -> "0":nw[label="1/2", constraint=false];
"0" -> "1"[label="1/2", constraint=false];
"1" -> "2"[label="1/2", constraint=false];
"2" -> "2"[label="1", constraint=false];
"1" -> "0"[label="1/2", constraint=false];
}
```
* State = num of heads seen so far
* Let $T_2 = \min(n \geq 0: X_n = 2)$
* $\beta_2(i) = E[T_2 | X_0 = i]$
* FSE:
* $\beta(0) = 1 + \frac{1}{2} \beta(0) + \frac{1}{2}\beta(1)$
* $\beta(1) = 1 + \frac{1}{2} \beta(0) + \frac{1}{2}\beta(2)$
* $\beta(2) = 0$
* Solve: $\beta_0 = 6$
* FSEs for MCs:
* $\mathcal{X} = \{1,2, \ldots N\}$
* $A \in \mathcal{X}$
* $T_A = \min(n \geq 0 | X_n \in A)$
* $\beta(A) = E[T_A | X_0 = i], \forall i \in \mathcal{X}$
* $\beta_A(i) =
\begin{cases} 1 + \sum\limits_j P(i,j)\beta_A(j) & if i \not \in A\\
0 & if i \in A\end{cases}$
## Hitting one state before another

* $\alpha_A = P($ hit C before E | $X_0=A$)
* $\alpha_A = \frac{1}{2}\alpha_B + \frac{1}{2}\alpha_D$
* $\alpha_B = \alpha_c$
* $\alpha_C = 1$
* $\alpha_D = \frac{1}{3}\alpha_A + \frac{1}{3}\alpha_B + \frac{1}{3}\alpha_E$
* $\alpha_E = 0$
* $\alpha_A(i) =
\begin{cases} \sum\limits_j P(i,j)\alpha_A(j) & if i \not \in A \cup B\\
1 & if i \in A\\
0 & if i \in B\end{cases}$
## Acumulating Rewards
* Define $g: \mathcal{X} \rightarrow \mathbb{R}$
* $\gamma(i) = E[\sum\limits_{n=0}^{T_A} g(X_n|X_0=i)], i \in \mathcal{X}$
* Where $T_A$ is hitting time for A
* $\gamma(i) =
\begin{cases} g(i) + \sum\limits_j P(i,j)\gamma(j) & if i \not \in A\\
g(i) & if i \in A \end{cases}$
* Ex. flip a fair coin until you see 2 consecutive H.

* $g(S) = 0, g(H) = 0, g(HH) = 0, g(T) = 1$
* FSE:
* $\gamma(S) = 0 + \frac{1}{2}\gamma(H) + \frac{1}{2}\gamma(T)$
* $\gamma(H) = 0 + \frac{1}{2}\gamma(T) + \frac{1}{2}\gamma(HH)$
* $\gamma(T) = 1 + \frac{1}{2}\gamma(H) + \frac{1}{2}\gamma(T)$
* $\gamma(HH) = 0$
* Solve: $\gamma(S) = 3$
## Classification of general DTMCs
* Includes countably infinite
* Transience: A state i is said to be transient if given that we start in state i, there is a non-zero probability that we will never return to i
* Recurrent: 0 probability that we will never return to i
```graphviz
digraph G {
forcelabels=true;
graph [nodesep="0.5", ranksep="1"];
"0":w -> "0":nw[label="1/4", constraint=false];
"0" -> "1"[label="3/4", constraint=false];
"1" -> "2"[label=" ", constraint=false];
"1":nw -> "1":ne[label=" ", constraint=false];
"2" -> "2"[label=" ", constraint=false];
"2" -> "1"[label=" ", constraint=false];
}
```
Transient states: 0, Recurrent states: 1,2
```graphviz
digraph G {
forcelabels=true;
graph [nodesep="0.5", ranksep="1"];
"0":w -> "0":nw[label=" ", constraint=false];
"1" -> "0"[label=" ", constraint=false];
"1" -> "2"[label=" ", constraint=false];
"1":nw -> "1":ne[label=" ", constraint=false];
"2":nw -> "2":ne[label=" ", constraint=false];
"2" -> "3"[label=" ", constraint=false];
"3" -> "2"[label=" ", constraint=false];
"3" -> "3"[label=" ", constraint=false];
}
```
Recurrent: 0, transient: 1, recurrent class: 2,3
* Let $T_i = \min(n > 0 | X_n=i)$, first return to state i
* If MC is irreducible, then $P(T_i < \infty | X_0 = i)$
* =1 if recurrent, <1 if transient
* If MC is recurrent:
* if $E[T_i| X_0 = i] < \infty$: positive recurrent
* if $E[T_i| X_0 = i] = \infty$: null recurrent
* It MC is irred, aperiodic and positive recurrent, then $P(X_n=j | X_o=i) = \pi(j)$ and MC is asymptotically stationary
* Ex.

* $p < \frac{1}{2} \rightarrow$ true recurrent
* $p > \frac{1}{2} \rightarrow$ transient
* $p = \frac{1}{2} \rightarrow$ null recurrent

* All states are either trans, true rec, or null rec
## Null Recurrence

* Transient or recurrent?
* If P(do not return to state) = 0: recurrent, >0: transient
* P( do not return to state 1 | start at state 1) = $P(1 \rightarrow 2 \ldots n) = \frac{1}{n+1}$
* As $n \to \infty$: 0
* Therefore, recurrent
* $E[T_1 | X_0 = 1] = \frac{1}{2}(1) + \frac{1}{2}\frac{1}{3}(2) + \frac{1}{2}\frac{1}{3}\frac{1}{4}(3) + \frac{1}{i}{1}{i+1}(i) = \frac{1}{2} + \frac{1}{3} + \ldots \frac{1}{i+1}$
* Harmonic series which diverges
## Lecture 15: 3/13/20
## Reversibility
* Assume we have irred and positive recurrent MC started at its inv. dist $\pi$
* Suppose for every n ($X_0, X_1, \ldots X_n$) has the same joint PMF as its time-reversed ($X_n, X_{n-1} \ldots X_0)$, then it is reversible
* Fact 1) MC run backwards is always a MC
* MC property reverse holds
* $\widetilde{P}_{ji} = \frac{\pi(i)}{\pi(j)}P_{ij}$
* Fact 2) If it is reversible, it is the same MC
* Detailed Balance equations: $\pi(i)\widetilde{P}_{ji} = \pi(j)P_{ij}$
* If a MC is reversible, it has $\pi$, an invariant distribution
* $\sum\limits_i \pi(i) P_{ij} = \sum\limits_i \pi(j) P_{ji} = \pi(j) \sum\limits_i P_{ji} = \pi(j)$
## Reflected Random Walk DBEs

* Use Detailed Balance Equations (DBE) to show stationary dist
* Let $p < \frac{1}{2}$, positive recurrent
* Assume $\pi(i)P_{ij} = \pi(j)P_{ji}$
* Let $\rho = \frac{p}{1-p}$
* For $(0,1): \pi_0 p = \pi_1(1-p) \implies \pi_1 = \rho \pi_0$
* For $(n-1,n): \pi_{n} = \rho \pi_{n-1} = \rho^n \pi_0$
* $= \sum\limits_{i=0}^\infty \pi_i = 1 \implies \sum\limits_{i=0}^\infty \rho^i \pi_0 = 1$
* Using inf. geometric sum: $\pi_0 =1-p$
* Invariant dist: $\pi_n = (1-p)p^n$
* If you get a solution using DBE, sufficient for stationary dist.
* If you get 0 or infinity, not reversible
* Fact: If you remove all self loops and multiple edges between nodes and have a tree, stationary dist satisfies DBE
## Poisson Processes
* Continuous time counting process
* Continuous time analog of coin flip Bernoulli process
* Absolute arrival times: packets arrive at $T_i$
* Inter-arrival times: $S_i$
* $T_n = \sum\limits_{i=1}^n S_i$
* $S_1 \ldots S_n$ are iid $\sim Expo(\lambda)$ RVs
* $f_{s_i}(t) = \lambda e^{-\lambda t}, t > 0$
* $N_t \triangleq \max\limits_{n \geq 1} (n | T_n \leq t), t \geq 0$, $0$ if $t < T_1$
* Recall:
* Memoryless property
* $P(\tau \leq t + \epsilon | \tau > t) = \lambda \epsilon + o(\epsilon)$
* $o(\epsilon)$ s.t. $\lim\limits_{\epsilon \to 0} \frac{o(\epsilon)}{\epsilon} = 0$
* Poisson Process is memoryless
* If $N_t \sim PP(\lambda)$, then so is $N_{t'+s}-N_{t'}$
* 1) Poisson RP has independent and stationary increments
* For any $0 \leq t_2 < \ldots$, $N_{t_{n+1}} - N_{t_n}$ are indep and dist depends only on $t_{n+1} - t_n$
* If $\mathbb{N} = \{N_t, t \geq 0\}$ is a $PP(\lambda)$, then $N_t =$num arrivals in (0,t), has a poisson dist
* Ex. $P(N_t =k) = e^{-\lambda t} \frac{(\lambda t)^k}{k!}$
# Lecture 16: 3/17/20
## PP Recap
* Interarrival times $S_i$ are iid Exp($\lambda$)
* $PP(\lambda)$ has indep and stationary increments
* Num of arrivals in non overlapping intervals is indep and only depends on length of interval
* $P(one arrival) = \lambda dt$, $P(zero arrivals) = 1 - \lambda dt$
* Theorem: $N_t \sim Poiss(\lambda t)$
* $f_{T_1 \ldots T_k}(t_1 \ldots t_k) = \lambda^k e^{-\lambda k}$
* Uniform over the support of $T_1 <\ldots T_k$
* Given arrivals occured in $(0,t)$, distributed uniformly
* $N_t(k) = \int_{t_1} \ldots \int_{t_k} f_{T_1, \ldots T_k}(t_1 \ldots t_k) dt_1 \ldots dt_k$
* $= \lambda^k e^{-\lambda t} \int_0^t \ldots \int_0^t dt_1 \ldots dt_k$
* Where support = $S= t_1 < t_2 < \ldots t_k$
* $= \lambda^k e^{-\lambda t} Vol(S)$
* Without any contraints, $Vol(s) = t^k$
* This has all permutations and by symmetry, each have same volume
* $N_t(k)= \lambda^k e^{-\lambda t} \frac{t^k}{k!}$
* $N_t(k) = \frac{(\lambda t)^k}{k!} e^{-\lambda t}$
### Ex. Fishing
* Bob catche fish at $PP(\lambda = 0.6/hr)$
* If he catches at least 1 fish in first 2 hrs, he quits
* Otherwise, continue until first fish caught
* a) What is P(Bob fishes > 2hrs)$
* He must not have caught any fish in the first 2 hrs
* $P(N_2 = 0) = e^{-\lambda 2} = e^{-1.2}$
* b) What is P(Bob catches at least 2 fish)?
* 1 - P(catch 1 fish in first 2hrs) - P(catch 0 fish in first 2 hrs)
* $= 1- P(N_2=1) - P(N_2=0) = 1 - e^{-1.2} - 1.2e^{-1.2}$
* c) What is E[fish caught by Bob]?
* Total fish caught = fish from [0,2] + fish from $(2, \infty)$
* Linearity of expectation: E[total caught] = E[Poiss(1.2)] + 1 * P(still fishing in 2 to inf)
* $= 1.2 + (1)e^{-1.2}$
* d) E[fishing time | T > 4hrs]
* By memoryless property: 4 + E[exp(0.6)]
* $= 4 + \frac{1}{0.6}$
## Merging and Splitting of PP
* Merge $N_1 \sim PP(\lambda_1), N_2 \sim PP(\lambda_2)$
* $N = N_1 + N_2 \sim PP(\lambda_1 + \lambda_2)$
* Splitting N to $N_1, N_2$ with probability $p, 1-p$
* Let $N \sim PP(\lambda)$
* $N_1 \sim PP(\lambda p), N_2 \sim PP(\lambda (1-p))$, $N_1(t), N_2(t)$ RVs are indep
* Ex. 2 light bubls have indep and exp lifetimes $T_a \sim exp(\lambda_a), T_b \sim exp (\lambda_b)$
* Old solution: $P(Z \geq z) = P(T_a \geq z)P(T_b \geq z) = e^{-(\lambda_a + \lambda_b)}$, $Z \sim exp(\lambda_a + \lambda_b)$
Using PP: $T_a, T_b$ are times of first arrivals of 2 indep PP of rates $\lambda_a, \lambda_b$
* Imagine bulb A,B merging with $PP(\lambda_A + \lambda_B)$
## Interarrival Times
* Note $T_k = S_1 + \ldots S_k$
* $E[T_k] = \sum\limits_{i=1}^k = \frac{k}{\lambda}$
* $Var(T_k) = k var(S_i) = \frac{k}{\lambda^2}$
* PDF of $T_k: f_{T_k}(t)dt = P(k-1$ arr in 0,t)P(1 arr in t, t+dt)
* $\dfrac{e^{-\lambda t} (\lambda t)^{k-1}}{(k-1!)} \lambda dt$
* $f_{T_k}(t) = \dfrac{\lambda^k t^{k-1}}{(k-1!)} e^{-\lambda t}$
* Kth order Erlang pdf
* $T_k \sim Erlang-k(\lambda)$
* As $k \to \infty$, becomes Normal
# Lecture 17: 3/19/20
## PP Nomenclature
* $T_i$: absolute arrival times
* $S_i = T-i - T_{i-1}$: inter-arrival times
* $S_i$ are iid $Expo (\lambda)$ RVs
* $PP(\lambda)$ is memoryless: has indep and stationary increments
* $N_t \triangleq$ num of arrivals in an interval of length t
* $N_t \sim Pois(\lambda t), N_t = e^{-\lambda t} \frac{(\lambda t)^k}{k!}, k = 0,1,2\ldots$
* RP = RVs stacked in time or space
* Merging PP: $\sim PP(\lambda_1 + \lambda_2)$
* Splitting PP: $\sim PP(\lambda p), PP(\lambda (1-p))$
* Distribution of arrivals in an interval conditioned on number of arrivals in that interval is uniform
* $f_{T_1 \ldots T_k} (t_1, t_2 \ldots t_k | N_t = k) = \frac{k!}{t^k}$
## Random Incidence Paradox (RIP)
* Fix $t^*$, what is length of interval that contains it
* Natural answer: $E[L] = \frac{1}{\lambda}$
* L $\sim$ Erlang-2(\lambda) $\rightarrow \frac{2}{\lambda}$
* For interval $U = T_3, t^*, V = T_4$
* $L = (t^* - U) + (V - t^*)$
* $V - t^*$ is $Exp(\lambda)$
* $P(t^* - U > x) =$ more than x secs have elapsed since last arrival
* = P(no arrivals in $[T^* - x, t^*]$)
* = P(no arrivals in (0,x)) = $e^{-\lambda x}$
* PP run backwards is a PP
## Ex: RIP
* Buses arrive on the hour and 5min after the hour
* Alternates between 5min and 55min, so avg is 30min rate
* Suppose you arrive randomly on the hour, what is the average inter-arrival time?
* Want E[length of interarrival interval]
* $\frac{55}{60}$ chance of arriving in larger interval
* $E[length] = \frac{1}{12}(5) + \frac{11}{12}(55) = 50.83$