# EE 126 MT2 Notes * MT1 Notes: https://hackmd.io/@matthewtang/126mt1 * MT2 Notes: https://hackmd.io/@matthewtang/126mt2 # Lecture 9: 2/18/20 ## Weak Law of Large numbers * $M_n = \frac{1}{n} \sum\limits_{i=1}^n X_i$ * $X_i$ has mean $\mu$, var $\sigma^2$ * $E[M_n] = \frac{1}{n}E[X_i]n = \mu$ * $var(M_n) = \frac{1}{n^2} \sum\limits_{i=1}^n var(X_i) = \frac{\sigma^2}{n}$ * As $n \to \infty \colon E[M_n] = \mu, var(M_n) = 0$ * For iid $X_1, X_2, \ldots X_n$ with mean $\mu$, $\epsilon > 0$, $M_n = \frac{X_1 + \ldots X_n}{n}$ * $P(|M_n - \mu| \geq \epsilon) = P(|\frac{X_1 + \ldots X_n}{n} - \mu| \geq \epsilon) \to 0$ as $n \to \infty$ * Chevyshev bound provides $\frac{\sigma^2}{n \epsilon^2}$ which $\to 0$ as $n \to \infty$ * $P(|M_n - \mu| \geq \epsilon) \leq \delta$, $n > n_0 (\epsilon, \delta)$ * Confidence: $\delta$, accuracy: $\epsilon$ * $M_n$ converges in probability to $\mu$ * Beyond $n > n_0$, P(beyond $\mu \pm \epsilon) \leq \delta$ ### WLLN Example * $X_1, X_2 \ldots X_n \sim U[-1,1]$ * $F_{Y_n} = F_X(ny) \implies f_{X_n}(y) = nf_X(ny)$ * Ex 2 * $P(|Y_n| \geq \epsilon) = 0$ if $n > \frac{1}{\epsilon}$ * $Y_n = \min(X_1, X_2 \ldots X_n)$ * $P(|Y_n - 0| > \epsilon) = P(|X_1| > \epsilon, \ldots |X_n| > \epsilon) = (1 - \epsilon)^n \to 0$ as $n \to \infty$ ### Packet arrivals * $Y_n = 1$ if arrival at time n * $I_k = \{2^k, 2^k+1, \ldots 2^{k+1}-1\}$ * 1 arrival in each interval $I_k$, equally likely anywhere in interval * $P(Y_n = 1) = \frac{1}{2^k}$ * $\lim\limits_{n \to \infty} P(Y_n = 1) = \lim\limits_{k \to \infty} \frac{1}{2^k} = 0$ * $Y_n$ converges to 0 in probability * For any finite n, certain to be infinite num of arrivals after n ## Convolution CLT * $S_n = X_1 + X_2 + \ldots X_n$ * Convolving 2 squares creates triangle * Convolving square and triangle creates curve * CLT says that infinite sample size become gaussian * Problem is that mean and var explode to infinity * Normalize so $E[S_n] = n\mu, var(s_n) = n \sigma^2$ * Define $\hat{S}_n = \frac{S_n - n \mu}{\sqrt{n} \sigma}$ * 0 mean, 1 variance * CLT says $\lim P(\hat{S_n} \leq x) = \Phi(x), \forall x$ * $\Phi(x) = \int\limits_{-\infty}^x e^{-t^2/2} dt$ * $\hat{S_n} \to \mathcal{N}(0,1)$ * $S_n \to\mathcal{N}(n \mu, n \sigma^2)$ * $M_n \to \mathcal{N}(\mu, \sigma^2/n)$ # Lecture 10: 2/25/20 ## Strong Law of Large Numbers * WLLN: $\lim\limits_{n \to \infty} P(|M_n - \mu| \geq \epsilon) = 0$ * $P( \lim\limits_{n \to \infty} M_n = \mu) = 1$ (Convergence almost surely) * Proof of CLT * $X_1, \ldots X_n$ iid RV mean $\mu, \sigma^2$ * $S_n = \sum\limits_{i=1}^nX_i \rightarrow E[S_n] = n\mu, var(S_n) = n\sigma^2$ * $M_n = \frac{1}{n} \sum\limits_{i=1}^nX_i \rightarrow E[S_n] = \mu, var(S_n) = \frac{\sigma^2}{n}$ * $\hat{S_n} = \dfrac{\sum\limits_{i=1}^nX_i - n\mu}{\sigma\sqrt{n}} \rightarrow E[\hat{S_n}] = 0, var(\hat{S_n}) = 1$ * If $Y \sim N(0,1) \rightarrow M_Y(s) = e^{s^2/2}$ * $\log M_y(s) = s^2/2$ * $M_{Z_n}(s) = E[e^{sZn}] = E[e^{s/\sqrt{n}X_1} \ldots]$ * $= E[e^{s/ \sqrt{n} X}]^n$ * $M_{Z_n}(s) = [M_X(\frac{s}{\sqrt{n}})]^n$ * $\lim\limits_{n \to \infty} M_{Z_n}(s) = n \log M_X(\frac{s}{\sqrt{n}})$ * Change of variables $y = \frac{1}{\sqrt{n}}$ Take the derivative wrt y and eval y to 0 * $\lim\limits_{y \to 0} M_x''(s)s^2/2$ ### Polling Example * Ask n randomly iid sampled voters $X_i = 1$ if yes * Want to have 95% confidence that $|M_n - p| < \epsilon$, $\epsilon$ is accuracy of poll * Let P = true prob, $M_n$ is empirical estimate * Chebyshev: $P(|M_n - p| \geq a) \leq \frac{var(M_n)}{a^2}$ * $var(M_n) = p(1-p) \leq \frac{1}{4}$ * $P(|M_n - p| \geq 0.1) \leq \frac{1/n}{0.1^2} = \frac{25}{n}$ * Want $\frac{25}{n} \leq 0.05 \rightarrow n \geq 500$ * CLT: * Want $P(|M_n - p| \geq 0.1) \leq 0.05$ * $P(\frac{|M_n - p|}{\frac{1}{2\sqrt{n}} } \geq \frac{0.1}{\frac{1}{2\sqrt{n}}}) \leq 0.05$ * Becomes $\sim N(0,1) > 0.2\sqrt{n} \leq 0.05$ * $0.2 \sqrt{n} \leq 2$ # Lecture 11: 2/27/20 ## Entropy * $H(X) = E[\log\frac{1}{P(X)}]$ * Design channel encoder and decoder to combat noise * 2 problems: Source coding (how to best compress source) * Channel coding * Source coding: * N iid RVs $X_i \ldots X_N$ each having entropy H(X) can be compressed into no more than $N(H(X) + \epsilon)$ bits, $\forall \epsilon > 0, n \to \infty$ * Conversely, compression to $< NH(X)$ bit is impossible without loss of information * $H(X) = -\sum\limits_{x \in \mathcal{X}} P(X) \log(p(x)) = E[-\log(P(x))]$ * H(X) measure the uncertainty/surprise/info content in X * Ex. $X \sim Bern(p)$ * $h(p) = p \log \frac{1}{p} + (1-p)(\log \frac{1}{1-p})$ * Binary entropy function * $\log \frac{1}{P(x)}$ is the self-information of X=x * $H(X,Y) = E[\log \frac{1}{P(X,Y)}]$ if x,y indep * $H(X,Y) = \sum\limits_x \sum\limits_y P(x,y) \log(\frac{1}{P(x,y)})$ * $H(X,Y) = \sum\limits_x \sum\limits_y P(x)P(y) \log(\frac{1}{P(x)} \log\frac{1}{P(y)})$ * $H(X,Y) = H(X) + H(Y)$ ## AEP (Asymptotic Equipartition Property) * Flip a coin with bias p n times indep. What is a typical sequence? * A typical sequence has np Heads and $n(1-p)$ tails * $p^{np} (1-p)^{n(1-p)}$ * Let $x=2^{\log_2x}$ * $=2^{np \log p} 2^{n(1-p)\log(1-p)}$ * $= 2^{n(p \log p + (1-p) \log(1-p))}$ * $= 2^{-nh(p)}$ * How many typical sequences are there? * $\binom{n}{np} \approx 2^{n h(p)}$ * Sterling approximation: $n! \approx (\frac{n}{e})^n$ * If $p=0.11$, $2^{nh(p)} = 2^{n/2}$ * Thm:If $X_1 \ldots X_n$ are iid $\sim P(X)$ then $- \frac{1}{n} \log p(X_1, X_2 \ldots X_n) \rightarrow H(X)$ * Functions of RVs are RVs, so $\log P(X_i)$ are iid * By WLLN demands that $- \frac{1}{n} \sum\limits_{i=1}^n \log p(X_i) = - E[\log P(X)] = H(X)$ ## Channel Capacity * BSC(p): Binary Symmetric Channel * Probability $p$ opposite bit, $1-p$ accurate * BEC(p): Binary Erasure Channel * Probability $p$ of error, $1-p$ accurate * Tells you $(*)$ or $(e)$ * Goal: Find capacity of $BEC(p)$ channel * Channel Capacity: * $\boxed{m} \rightarrow^{f_n} \boxed{T_x} \rightarrow x^{(n)} \boxed{\sum} y^{(n)} \rightarrow^{g_n} \boxed{R_X} \rightarrow \boxed{\hat{m}}$ * If $m = \hat{m}$, reliable * Capacity: Max rate of reliable communication * Rate = $\frac{L_n}{n} = R$ bits/ch. use * $R = \frac{L_n}{n}$ * Want $P_{err}^{(n)} = \max P(\hat{m} \neq m)$ * $C_{BEC} = 1-p$ bits/channel use * R is achievable for the channel if for every positive n that is long enough, there exists enc and dec functions $f_n, g_n$ st $P_{err}^{(n)} \to 0, n \to \infty$ * Largest achievable R is called the capacity of a channel * $C_{BEC} = 1-p$ bits/ channel use * Assume you know before transmission which bits will be erased * By SLLN P(np bits gets erased) goes to 1 * $C_{BEC} \leq 1-p$ # Lecture 12: 3/3/20 * Proof of $C_{BEC} \leq 1-p$ * Converse: If we know which bits will be erased, the max rate of reliable transmission is $(1-p)$ bits/ch use * Achievabilty: Show that $R = 1-p-\epsilon, \forall \epsilon > 0$ is achievable * Leverage SLLN: As $n \to \infty, P$ (erases np symbols) $\to 1$ * Codebook: Make a $(2^L \times n)$ lookup table with iid $B(\frac{1}{2})$ entries * Codebook $C_i$: codeword of $i^{th}$ message * Assume WLOG the last $np$ bits are erased * $C_P'$: truncated codebook: $n(1-p)$ columns survive, $np$ bits are lost * Decoding rule: If $c_j'$ is only entry in $C_p'$ matching $Y^{(n)}$ exactly, declare $\hat{m}_j$ * Otherwise: declare failure * $P(error) = P(c_1'$ is not unique) $=P( \cup_{i=2}^n{c_i = c_1})$ * By union bound: $\leq \sum\limits_{i \neq1} 2^{-n(1-p)} = 2^{L_n} 2^{-n(1-p)}$ * $= 2^{nR}2^{-n(1-p)}$ * $= 2^{nR}2^{n(R-(1-p))}$ * When $R < 1-p$: $P_e \to 0$ as $n \to \infty$ * Make $R = 1-p - \epsilon \implies P_e \leq 2^{-n\epsilon} \to 0$ ### BEC Example * $n=10,100; p=0.5; \epsilon = 0.01$ * $C_{BEC} = \frac{1}{2}$ bit/ch use = $5,000$ * $L = 10000(1 - 0.5 - 0.01) = 4,900$ * $P_e \leq 2^{-n \epsilon} \leq 2^{-100} \approx 0$ ## Other Schemes * $C_{BSC}(p) = 1- h(p)$ * With binary function $h(p) = -p\log(p) - (1-p)\log(1-p)$ * General Discrete Memoryless Channel (DMC) * $\boxed{X} \rightarrow \boxed{P(Y|X)} \rightarrow \boxed{Y}$ * $C = \max\limits_{P(X)} I(X;Y) = \max\limits_{P(X)}[H(X) - H(X|Y)]$ * $P(X)$ is a design choice for optimization * $R \leq C$ ## Huffman Encoding * Given $X \in \{A,0.4;B,0.35;C,0.2;D,0.05\}$ * Encode into binary stream * $H(X) = 1.74$ bits/symbol * Take the 2 least probable elements and merge them * Avg num of bits/symbol: $0.4(1) + 0.35(2) + 0.25(3) = 1.85$ bits/s * Less efficient bc integer constraint # Lecture 13: 3/5/20 ## Random Processes * A stochastic process $X = \{X_t\}_{t \in T}$ * Collection of RVs where index t is time * Discrete time random process when t is discrete * $X$ models evolution of a sequence of RVs * To fully characterize behavior of DTRP, need joint PMF of $\{X_0, X_1 \ldots X_n\}$ * Markov Structure: DTMC let $\mathcal{X}$ be a finite set called state space * Markov Property: $P(X_n = x_n | X_{n-1}=x_{n-1}\ldots X_0=x_0) = P(X_n = x_n | X_{n-1}=x_{n-1})$ * Given present, future is independent of past * $X_n$ conditionally independent on past states given $X_{n-1}$ ### Example ```graphviz digraph G { forcelabels=true; graph [nodesep="0.5", ranksep="1"]; "0":nw -> "0"[label=" 0.8"]; "2" -> "0"[label=" 0.8", constraint=false]; "2" -> "1"[label=" 0.2", constraint=false]; "1" -> "2"[label=" 0.7", constraint=false]; "1":nw -> "1":ne[label=" 0.3", constraint=false]; "0" -> "1"[label=" 0.2", constraint=false]; } ``` * State-transition matrix: * Rows sum to 1 (row-stochastic matrix) * from V, to--> * $\begin{bmatrix} 0.8 & 0.2 & 0\\ 0 & 0.3 & 0.7\\ 0.8 & 0.2 & 0 \end{bmatrix}$ * Initial dist: $\pi_0 = \begin{bmatrix} \pi_0(0) & \pi_0(1) & \pi_0(2)\end{bmatrix}$ * Want to find $\pi_n$ * Start with $n=1, P(X_1=j)$ * $P(X_1=j) =\sum \limits_{i=0}^2 P(X_1=j|X_0=i)P(X_0=i)$ * $\pi_i(j) =\sum \limits_{i=0}^2 P_{ij}\pi_0(i)$ * $\begin{bmatrix} \pi_1(0) & \pi_1(1) & \pi_1(2)\end{bmatrix} = \begin{bmatrix} \pi_0(0) & \pi_0(1) & \pi_0(2)\end{bmatrix}\begin{bmatrix} P_{00} & P_{01} & P_{02}\\ P_{10} & P_{11} & P_{12}\\ P_{20} & P_{21} & P_{22}\\ \end{bmatrix}$ * $\pi_1 = \pi_0 \begin{bmatrix} P_{00} & P_{01} & P_{02}\\ P_{10} & P_{11} & P_{12}\\ P_{20} & P_{21} & P_{22}\\ \end{bmatrix}$ * $\pi_n = \pi_0 P^n$ * Invariant distribution (stationary MC): $\pi = \pi P$ * Flow in = flow out for each state at equilibrium * State 0: $0.6z = 0.2x$ * State 1: $0.2x+.4z = .7y$ * State 2: $0.7y=z$ * Total: $x+y+z=1$ Irreducible: If it can go from any state to any state ```graphviz digraph G { forcelabels=true; graph [nodesep="0.5", ranksep="1"]; "0" -> "1"[label=" 1.0", constraint=false]; "1" -> "2"[label=" 0.4", constraint=false]; "2" -> "1"[label=" 1.0", constraint=false]; "1" -> "0"[label=" 0.6", constraint=false]; } ``` ```graphviz digraph G { forcelabels=true; graph [nodesep="0.5", ranksep="1"]; "0" -> "1"[label=" 1.0", constraint=false]; "1" -> "2"[label=" 0.4", constraint=false]; "2" -> "1"[label=" 0.9", constraint=false]; "1" -> "0"[label=" 0.6", constraint=false]; "2" -> "2"[label=" 0.1", constraint=false]; } ``` ```graphviz digraph G { forcelabels=true; graph [nodesep="0.5", ranksep="1"]; "0" -> "1"[label=" 1.0", constraint=false]; "1" -> "2"[label=" 0.4", constraint=false]; "1" -> "0"[label=" 0.6", constraint=false]; "2" -> "2"[label=" 1.0", constraint=false]; } ``` * A,B irreducible, C not irreducible * Period: $d(i) = gcd \{ n \geq 1 | P^n (i,i) > 0\}$ * $d(i) = d, \forall i \in \mathcal{X}$ * aperiodic: $d=1$ (Any self loop has period 1) * a: period 2 * b: period 1 ## General Questions * Does there exist an invariant dist? * Yes, if num of states is finite * Is the invariant dist unique? * Yes if finite, irreducible * Doesm$\lim\limits_{n \to \infty} \pi_n = \pi$? * Yes if irreducible and aperiodic * Big Theorem: If MC is finite and irreducible, it has a unique inv. dist $\pi$ and $\pi(i)$ is the long term fraction of time that $x_n$ is equal to $i$ * If MC is also periodic, $\lim\limits_{n \to \infty} \pi_n = \pi$ ### First passage time/hitting time * Starting at state A, how many steps to get to state E: $T_E$ * $\beta_E(A) \triangleq E[T_E | X_0 =A]$ * Key is to calculate $\beta(i)$ for $i=A,B,C,D,E$, $\beta(i) = E[T_E|X_0=i]$ * $\beta(A) = 1 + \frac{1}{2}\beta(B) + \frac{1}{2}\beta(D)$ * $\beta(B) = 1 + \beta(C)$ * $\beta(C) = 1 + \beta(A)$ * $\beta(D) = 1 \frac{1}{3}\beta(A) + \frac{1}{3} \beta(B) + \frac{1}{3} \beta(E)$ * $\beta(E) = 0$ * Solve sys of equations: $17, 19, 18, 13 ,0$ # Lecture 14: 3/10/20 * Ex. Expected num of tosses till 2 consec Heads ```graphviz digraph G { forcelabels=true; graph [nodesep="0.5", ranksep="1"]; "0":w -> "0":nw[label="1/2", constraint=false]; "0" -> "1"[label="1/2", constraint=false]; "1" -> "2"[label="1/2", constraint=false]; "2" -> "2"[label="1", constraint=false]; "1" -> "0"[label="1/2", constraint=false]; } ``` * State = num of heads seen so far * Let $T_2 = \min(n \geq 0: X_n = 2)$ * $\beta_2(i) = E[T_2 | X_0 = i]$ * FSE: * $\beta(0) = 1 + \frac{1}{2} \beta(0) + \frac{1}{2}\beta(1)$ * $\beta(1) = 1 + \frac{1}{2} \beta(0) + \frac{1}{2}\beta(2)$ * $\beta(2) = 0$ * Solve: $\beta_0 = 6$ * FSEs for MCs: * $\mathcal{X} = \{1,2, \ldots N\}$ * $A \in \mathcal{X}$ * $T_A = \min(n \geq 0 | X_n \in A)$ * $\beta(A) = E[T_A | X_0 = i], \forall i \in \mathcal{X}$ * $\beta_A(i) = \begin{cases} 1 + \sum\limits_j P(i,j)\beta_A(j) & if i \not \in A\\ 0 & if i \in A\end{cases}$ ## Hitting one state before another ![](https://i.imgur.com/4SX2H0b.png) * $\alpha_A = P($ hit C before E | $X_0=A$) * $\alpha_A = \frac{1}{2}\alpha_B + \frac{1}{2}\alpha_D$ * $\alpha_B = \alpha_c$ * $\alpha_C = 1$ * $\alpha_D = \frac{1}{3}\alpha_A + \frac{1}{3}\alpha_B + \frac{1}{3}\alpha_E$ * $\alpha_E = 0$ * $\alpha_A(i) = \begin{cases} \sum\limits_j P(i,j)\alpha_A(j) & if i \not \in A \cup B\\ 1 & if i \in A\\ 0 & if i \in B\end{cases}$ ## Acumulating Rewards * Define $g: \mathcal{X} \rightarrow \mathbb{R}$ * $\gamma(i) = E[\sum\limits_{n=0}^{T_A} g(X_n|X_0=i)], i \in \mathcal{X}$ * Where $T_A$ is hitting time for A * $\gamma(i) = \begin{cases} g(i) + \sum\limits_j P(i,j)\gamma(j) & if i \not \in A\\ g(i) & if i \in A \end{cases}$ * Ex. flip a fair coin until you see 2 consecutive H. ![](https://i.imgur.com/GMP5nRw.png) * $g(S) = 0, g(H) = 0, g(HH) = 0, g(T) = 1$ * FSE: * $\gamma(S) = 0 + \frac{1}{2}\gamma(H) + \frac{1}{2}\gamma(T)$ * $\gamma(H) = 0 + \frac{1}{2}\gamma(T) + \frac{1}{2}\gamma(HH)$ * $\gamma(T) = 1 + \frac{1}{2}\gamma(H) + \frac{1}{2}\gamma(T)$ * $\gamma(HH) = 0$ * Solve: $\gamma(S) = 3$ ## Classification of general DTMCs * Includes countably infinite * Transience: A state i is said to be transient if given that we start in state i, there is a non-zero probability that we will never return to i * Recurrent: 0 probability that we will never return to i ```graphviz digraph G { forcelabels=true; graph [nodesep="0.5", ranksep="1"]; "0":w -> "0":nw[label="1/4", constraint=false]; "0" -> "1"[label="3/4", constraint=false]; "1" -> "2"[label=" ", constraint=false]; "1":nw -> "1":ne[label=" ", constraint=false]; "2" -> "2"[label=" ", constraint=false]; "2" -> "1"[label=" ", constraint=false]; } ``` Transient states: 0, Recurrent states: 1,2 ```graphviz digraph G { forcelabels=true; graph [nodesep="0.5", ranksep="1"]; "0":w -> "0":nw[label=" ", constraint=false]; "1" -> "0"[label=" ", constraint=false]; "1" -> "2"[label=" ", constraint=false]; "1":nw -> "1":ne[label=" ", constraint=false]; "2":nw -> "2":ne[label=" ", constraint=false]; "2" -> "3"[label=" ", constraint=false]; "3" -> "2"[label=" ", constraint=false]; "3" -> "3"[label=" ", constraint=false]; } ``` Recurrent: 0, transient: 1, recurrent class: 2,3 * Let $T_i = \min(n > 0 | X_n=i)$, first return to state i * If MC is irreducible, then $P(T_i < \infty | X_0 = i)$ * =1 if recurrent, <1 if transient * If MC is recurrent: * if $E[T_i| X_0 = i] < \infty$: positive recurrent * if $E[T_i| X_0 = i] = \infty$: null recurrent * It MC is irred, aperiodic and positive recurrent, then $P(X_n=j | X_o=i) = \pi(j)$ and MC is asymptotically stationary * Ex. ![](https://i.imgur.com/2R72tpg.png) * $p < \frac{1}{2} \rightarrow$ true recurrent * $p > \frac{1}{2} \rightarrow$ transient * $p = \frac{1}{2} \rightarrow$ null recurrent ![](https://i.imgur.com/yGmNABL.png) * All states are either trans, true rec, or null rec ## Null Recurrence ![](https://i.imgur.com/2OhWieU.png) * Transient or recurrent? * If P(do not return to state) = 0: recurrent, >0: transient * P( do not return to state 1 | start at state 1) = $P(1 \rightarrow 2 \ldots n) = \frac{1}{n+1}$ * As $n \to \infty$: 0 * Therefore, recurrent * $E[T_1 | X_0 = 1] = \frac{1}{2}(1) + \frac{1}{2}\frac{1}{3}(2) + \frac{1}{2}\frac{1}{3}\frac{1}{4}(3) + \frac{1}{i}{1}{i+1}(i) = \frac{1}{2} + \frac{1}{3} + \ldots \frac{1}{i+1}$ * Harmonic series which diverges ## Lecture 15: 3/13/20 ## Reversibility * Assume we have irred and positive recurrent MC started at its inv. dist $\pi$ * Suppose for every n ($X_0, X_1, \ldots X_n$) has the same joint PMF as its time-reversed ($X_n, X_{n-1} \ldots X_0)$, then it is reversible * Fact 1) MC run backwards is always a MC * MC property reverse holds * $\widetilde{P}_{ji} = \frac{\pi(i)}{\pi(j)}P_{ij}$ * Fact 2) If it is reversible, it is the same MC * Detailed Balance equations: $\pi(i)\widetilde{P}_{ji} = \pi(j)P_{ij}$ * If a MC is reversible, it has $\pi$, an invariant distribution * $\sum\limits_i \pi(i) P_{ij} = \sum\limits_i \pi(j) P_{ji} = \pi(j) \sum\limits_i P_{ji} = \pi(j)$ ## Reflected Random Walk DBEs ![](https://i.imgur.com/2R72tpg.png) * Use Detailed Balance Equations (DBE) to show stationary dist * Let $p < \frac{1}{2}$, positive recurrent * Assume $\pi(i)P_{ij} = \pi(j)P_{ji}$ * Let $\rho = \frac{p}{1-p}$ * For $(0,1): \pi_0 p = \pi_1(1-p) \implies \pi_1 = \rho \pi_0$ * For $(n-1,n): \pi_{n} = \rho \pi_{n-1} = \rho^n \pi_0$ * $= \sum\limits_{i=0}^\infty \pi_i = 1 \implies \sum\limits_{i=0}^\infty \rho^i \pi_0 = 1$ * Using inf. geometric sum: $\pi_0 =1-p$ * Invariant dist: $\pi_n = (1-p)p^n$ * If you get a solution using DBE, sufficient for stationary dist. * If you get 0 or infinity, not reversible * Fact: If you remove all self loops and multiple edges between nodes and have a tree, stationary dist satisfies DBE ## Poisson Processes * Continuous time counting process * Continuous time analog of coin flip Bernoulli process * Absolute arrival times: packets arrive at $T_i$ * Inter-arrival times: $S_i$ * $T_n = \sum\limits_{i=1}^n S_i$ * $S_1 \ldots S_n$ are iid $\sim Expo(\lambda)$ RVs * $f_{s_i}(t) = \lambda e^{-\lambda t}, t > 0$ * $N_t \triangleq \max\limits_{n \geq 1} (n | T_n \leq t), t \geq 0$, $0$ if $t < T_1$ * Recall: * Memoryless property * $P(\tau \leq t + \epsilon | \tau > t) = \lambda \epsilon + o(\epsilon)$ * $o(\epsilon)$ s.t. $\lim\limits_{\epsilon \to 0} \frac{o(\epsilon)}{\epsilon} = 0$ * Poisson Process is memoryless * If $N_t \sim PP(\lambda)$, then so is $N_{t'+s}-N_{t'}$ * 1) Poisson RP has independent and stationary increments * For any $0 \leq t_2 < \ldots$, $N_{t_{n+1}} - N_{t_n}$ are indep and dist depends only on $t_{n+1} - t_n$ * If $\mathbb{N} = \{N_t, t \geq 0\}$ is a $PP(\lambda)$, then $N_t =$num arrivals in (0,t), has a poisson dist * Ex. $P(N_t =k) = e^{-\lambda t} \frac{(\lambda t)^k}{k!}$ # Lecture 16: 3/17/20 ## PP Recap * Interarrival times $S_i$ are iid Exp($\lambda$) * $PP(\lambda)$ has indep and stationary increments * Num of arrivals in non overlapping intervals is indep and only depends on length of interval * $P(one arrival) = \lambda dt$, $P(zero arrivals) = 1 - \lambda dt$ * Theorem: $N_t \sim Poiss(\lambda t)$ * $f_{T_1 \ldots T_k}(t_1 \ldots t_k) = \lambda^k e^{-\lambda k}$ * Uniform over the support of $T_1 <\ldots T_k$ * Given arrivals occured in $(0,t)$, distributed uniformly * $N_t(k) = \int_{t_1} \ldots \int_{t_k} f_{T_1, \ldots T_k}(t_1 \ldots t_k) dt_1 \ldots dt_k$ * $= \lambda^k e^{-\lambda t} \int_0^t \ldots \int_0^t dt_1 \ldots dt_k$ * Where support = $S= t_1 < t_2 < \ldots t_k$ * $= \lambda^k e^{-\lambda t} Vol(S)$ * Without any contraints, $Vol(s) = t^k$ * This has all permutations and by symmetry, each have same volume * $N_t(k)= \lambda^k e^{-\lambda t} \frac{t^k}{k!}$ * $N_t(k) = \frac{(\lambda t)^k}{k!} e^{-\lambda t}$ ### Ex. Fishing * Bob catche fish at $PP(\lambda = 0.6/hr)$ * If he catches at least 1 fish in first 2 hrs, he quits * Otherwise, continue until first fish caught * a) What is P(Bob fishes > 2hrs)$ * He must not have caught any fish in the first 2 hrs * $P(N_2 = 0) = e^{-\lambda 2} = e^{-1.2}$ * b) What is P(Bob catches at least 2 fish)? * 1 - P(catch 1 fish in first 2hrs) - P(catch 0 fish in first 2 hrs) * $= 1- P(N_2=1) - P(N_2=0) = 1 - e^{-1.2} - 1.2e^{-1.2}$ * c) What is E[fish caught by Bob]? * Total fish caught = fish from [0,2] + fish from $(2, \infty)$ * Linearity of expectation: E[total caught] = E[Poiss(1.2)] + 1 * P(still fishing in 2 to inf) * $= 1.2 + (1)e^{-1.2}$ * d) E[fishing time | T > 4hrs] * By memoryless property: 4 + E[exp(0.6)] * $= 4 + \frac{1}{0.6}$ ## Merging and Splitting of PP * Merge $N_1 \sim PP(\lambda_1), N_2 \sim PP(\lambda_2)$ * $N = N_1 + N_2 \sim PP(\lambda_1 + \lambda_2)$ * Splitting N to $N_1, N_2$ with probability $p, 1-p$ * Let $N \sim PP(\lambda)$ * $N_1 \sim PP(\lambda p), N_2 \sim PP(\lambda (1-p))$, $N_1(t), N_2(t)$ RVs are indep * Ex. 2 light bubls have indep and exp lifetimes $T_a \sim exp(\lambda_a), T_b \sim exp (\lambda_b)$ * Old solution: $P(Z \geq z) = P(T_a \geq z)P(T_b \geq z) = e^{-(\lambda_a + \lambda_b)}$, $Z \sim exp(\lambda_a + \lambda_b)$ Using PP: $T_a, T_b$ are times of first arrivals of 2 indep PP of rates $\lambda_a, \lambda_b$ * Imagine bulb A,B merging with $PP(\lambda_A + \lambda_B)$ ## Interarrival Times * Note $T_k = S_1 + \ldots S_k$ * $E[T_k] = \sum\limits_{i=1}^k = \frac{k}{\lambda}$ * $Var(T_k) = k var(S_i) = \frac{k}{\lambda^2}$ * PDF of $T_k: f_{T_k}(t)dt = P(k-1$ arr in 0,t)P(1 arr in t, t+dt) * $\dfrac{e^{-\lambda t} (\lambda t)^{k-1}}{(k-1!)} \lambda dt$ * $f_{T_k}(t) = \dfrac{\lambda^k t^{k-1}}{(k-1!)} e^{-\lambda t}$ * Kth order Erlang pdf * $T_k \sim Erlang-k(\lambda)$ * As $k \to \infty$, becomes Normal # Lecture 17: 3/19/20 ## PP Nomenclature * $T_i$: absolute arrival times * $S_i = T-i - T_{i-1}$: inter-arrival times * $S_i$ are iid $Expo (\lambda)$ RVs * $PP(\lambda)$ is memoryless: has indep and stationary increments * $N_t \triangleq$ num of arrivals in an interval of length t * $N_t \sim Pois(\lambda t), N_t = e^{-\lambda t} \frac{(\lambda t)^k}{k!}, k = 0,1,2\ldots$ * RP = RVs stacked in time or space * Merging PP: $\sim PP(\lambda_1 + \lambda_2)$ * Splitting PP: $\sim PP(\lambda p), PP(\lambda (1-p))$ * Distribution of arrivals in an interval conditioned on number of arrivals in that interval is uniform * $f_{T_1 \ldots T_k} (t_1, t_2 \ldots t_k | N_t = k) = \frac{k!}{t^k}$ ## Random Incidence Paradox (RIP) * Fix $t^*$, what is length of interval that contains it * Natural answer: $E[L] = \frac{1}{\lambda}$ * L $\sim$ Erlang-2(\lambda) $\rightarrow \frac{2}{\lambda}$ * For interval $U = T_3, t^*, V = T_4$ * $L = (t^* - U) + (V - t^*)$ * $V - t^*$ is $Exp(\lambda)$ * $P(t^* - U > x) =$ more than x secs have elapsed since last arrival * = P(no arrivals in $[T^* - x, t^*]$) * = P(no arrivals in (0,x)) = $e^{-\lambda x}$ * PP run backwards is a PP ## Ex: RIP * Buses arrive on the hour and 5min after the hour * Alternates between 5min and 55min, so avg is 30min rate * Suppose you arrive randomly on the hour, what is the average inter-arrival time? * Want E[length of interarrival interval] * $\frac{55}{60}$ chance of arriving in larger interval * $E[length] = \frac{1}{12}(5) + \frac{11}{12}(55) = 50.83$