# EE 126 MT1 Notes * MT1 Notes: https://hackmd.io/@matthewtang/126mt1 * MT2 Notes: https://hackmd.io/@matthewtang/126mt2 # Lecture 2: 1/23/20 ### Birthday paradox (Complement) * Probability that at least 2ppl in a group share the same bday * Assume 365 days in a year (=k) * $|\Omega| = k^n = 365^n$ * A = "At least 2 people have the same bday" * $A^C$ = "No 2 people share a bday" (easier to compute) * $P(A^C) = \frac{|A^C|}{|\Omega|} = \frac{(365)(364)\ldots(365-(n-1))}{365^n}$ * $P(A^C) = 1(1 - \frac{1}{k})(1 - \frac{2}{k}) \ldots (1 - \frac{n-1}{k}))$ * Use Taylor series: $e^x \approx 1 + x$ for $|x| << 1$ (Assume there are 30-40 people) * $\approx 1 e^{-\frac{1}{k}} e^{-\frac{2}{k}} \ldots$ * $= e^{-\frac{1}{k}(1 + 2 + 3 + \ldots (n-1))}$ $\boxed{\approx e^{- \frac{n^2}{2k}}}$ * For $k=365, n=23 \rightarrow P(\geq 1 pair) \approx 50.4\%$ * For $n=50 \rightarrow \approx 97\%$ ### False Positive Quiz (Bayes rule) * Test for a rare disease * If person has disease, P(test+) = .95 * If person does not have disease, P(test-) = .95 * P(person has disease) = .001 * Want P(disease+ | test+) * Let A = disease+ * Let B = test+ * $P(A|B) = \frac{P(B|A)P(A)}{P(A)P(B|A) + P(A^C)P(B|A^C)}$ * $= \frac{(.95)(.001)}{(.95)(.001)+(.999)(.005)} = .0187$ * If $P(A) = .01 \rightarrow P(A|B) = .16$ * If $P(A) = .5 \rightarrow P(A|B) = .95$ ## Independence * 2 events are indep if occurrence of 1 does not affect the other * $\boxed{P(A|B) = P(A)}$ * $\boxed{P(A \cap B) = P(A)P(B)}$ * A & B are disjoint iff $P(A \cap B) = 0 \implies P(A) = 0 or P(B) = 0$ * A or B must have 0 probability if A are both independent and disjoint * Note: Base outcomes of a random experiment are disjoint and have non-zero prob $\implies$ always dependent ## Conditional Independence * $\boxed{P(A \cap B | C) =P(A|C) P(B|C)}$ * Dep events can be cond indep * Indep events can be cond dependent ### Unfair coins * 2 Indistinguishable coins: 1 is 2 tailed, 1 is 2 headed * You pick 1 of the 2 coins at random and flip it twice * $H_i$: ith flip is a head (i=1,2) * $H_1, H_2$ are not independent * $P(H_2 | H_1) = 1 \neq P(H_1)P(H_2) = \frac{1}{4}$ * Let A = pick 2-headed coin * $P(H_1, H_2 | A) = P(H_1|A)P(H_2|A) \implies $ cond indep ## Collective independence * $P(\cap_{i \in S} A_i) = \prod\limits_{i \in S} P(A_i)$ * S is any subset of the collection of events * Pairwise indep does not imply joint indep of 3+ events ### Die rolling * Roll 2 fair dice, what is the prob of rolling a 6 before a 7 * Use independence. E = roll 6 before a 7 * Condition on first roll. Let S be event that first roll is a 6 * Let T = event that the first roll is a 7 * $P(E) = P(E|S)P(S) + P(E|T)P(T) + P(E|(S\cup T)^C)P((S\cup T)^C)$ * $P(E) = 1(\frac{5}{36})+ 0 + P(E)(\frac{25}{36})$ * $P(E) = \frac{5}{11}$ ## Random Variables * RVs associate a real number with each event * Ex. $X^2$ * Assigning real numbers allows us to do statistics * RV is a function from $\Omega \rightarrow \mathbb{R}$ * Ex. 2 4 sided dies * $M_k$ is the event that the min of the 4 sided die is k * M is the RV that is equal to the value of the minimum * If you roll (2,3), M=2, $M_2$ is the event that the min is 2 * Form a PMF (for discrete RVs, density for continuous) * Each base outcome of experiment maps to 1 RV value * $P_X(x)=P(X=x)$ * Normalization: $\sum\limits_x P_X(x) = 1$ ### Chess match * A and K play first to win * The match is drawn if 10 consecutive draws * $P(A+) = .3$, $P(K+) = .4$, $P(draw) = 0.3$ * Goal: find PMF of match duration (L) * $P_L(l) = \begin{cases} 0.3^9 & l=10\\ .7(0.3)^{l-1} & 1 \leq l \leq 9\end{cases}$ * $P(A+) = (0.3) \sum\limits_{l=0}^9 (0.3)^l = (1-0.3^{10}) \frac{.3}{.7}$ # Lecture 3: 1/28/20 ## DRVs * Distribution/PMF is collection of all values $\{ a, P_x(a), a \in \mathcal{A}\}$ where $\mathcal{A}$ is the set of all possible values taken by X * Functions of RVs are RVs * $Y=g(X), P_Y(y) = \sum\limits_{\{x | g(x) = y\}} P_X(x)$ * Ex. $Y=|X|$, X is uniform between -2, 2 * Y is always positive, so either 0 (1/5), 1 (2/5), 2 (2/5) ## Expectation * $E[X] = \sum\limits_{x \in \mathcal{X}} x P(X=x) = \sum\limits_{x \in \mathcal{X}} xP_X(x)$ * Sum of all values that X can take weighted by their mass * Alternately: $E[X] = \sum\limits_{w \in \Omega} X(w)P(w)$ * Linearity of Expectation(LOE): For any 2 RVs X, Y that are defined on the same probability space $E[X+Y] = E[X] + E[Y]$ * $E[cX] = cE[x]$ * No independence assumptions made * Ex. Average of sum of 2 rolls * $E[X] = E[X_1 + X_2] = E[X_1] + E[X_2] = 7$ * Ex. Find the avg num of students who get back their hw * Let $X_i$ be an indicator RV for the ith student gets their hw (1 if get HW back, 0 if not) * Let X be the num of students who got back their own HWs * $E[X] = \sum\limits_{i=1}^N E[X_i] = 1$ ## Variance * Var(X) = $\sigma_X^2 = E[(X - E[X])^2] = E[X^2] - E[X]^2$ * Stdev = $\sigma_X = \sqrt{\text{Var}(X)}$ * Tail Sum Formula (TSF): If X is a nonnegative DRV, $E[X] = \sum\limits_{k=0}^\infty P(X \geq k)$ * If $E[X] = \sum\limits_{i=1}^\infty i P(X = i) = \sum\limits_{k=0}^\infty \sum\limits_{j=1}^i P(X=i)$ * $\sum\limits_{j=1}^\infty \sum\limits_{k=0}^\infty P(X=i) = \sum\limits_{j=1}^\infty P(X \geq j)$ ## Common Random Variables * Discrete uniform distribution: $U[a, a+1, \ldots b]$ $P_X(k) = \begin{cases} \frac{1}{b-a+1} & \text{if } k = a, a+1 , \ldots b\\ 0 & \text{otherwise} \end{cases}$ $E[X] = \frac{a+b}{2}$, $var(X) = \frac{n^2-1}{12}$ * Bernoulli RV: RV with 2 outcomes. Parameters: $p$ $P_X(k) = \begin{cases} p & \text{if } k = 0 \\ 1-p & \text{if } k = 1 \end{cases}$ $E[X] = p$, $var(X) = p(1-p)$ * Indicator RV: A bridge between events and RVs $X = \{1\}_A = \begin{cases} 1 & \text{If A is true}\\ 0 & \text{else} \end{cases}$ $E[X] = P(A)$ * Binomial RV: $n$ independent Bernoulli RVs. Parameters:$p, n$ $P_X(k) = \binom{n}{k} p^k (1-p)^{n-k}$ $E[X] = np$, $var(X) = np(1-p)$ * Geometric RV: How long you have to wait before an event happens. $P_X(k) = (1-p)^{k-1} p$ for $k = 1,2,3, \ldots$ $E[X] = \frac{1}{p}$, $var(X) = \frac{(1-p)}{p^2}$ * Poisson RV: Like a binomial RV with $p \to 0, n \to \infty, np = \lambda$ $P_X(k) = e^{-\lambda}\frac{\lambda^k}{k!}$ for $k = 0, 1, 2, \ldots$ $E[X] = \lambda$, $var(X) = \lambda$ * $1-p \approx e^{-p}$ * If $X \sim Poiss(\lambda_x), Y \sim Poiss(\lambda_y), X \& Y$ are indep, then $Z = X + Y \sim Poiss(\lambda_x + \lambda_y)$ * Merging: 2 queues of packets added * Splitting: 1 stream split into 2 queues(p, 1-p) with parameters $p\lambda_x, (1-p)\lambda_x$ # Lecture 4: 1/30/20 ### St. Peterburg Paradox * Toss a fair coin until you get a head * If you take n tosses, you get $2^n$ * How much should you pay to play this game? * $E[W] = \sum\limits^\infty_{k=0} 2^k(\frac{1}{2^k}) = 1 + \ldots 1 = \infty$ * Your utility function $U(w) \approx \log(W)$ ## Properties * If X, Y are indep, $E[X,Y] = E[X]E[Y]$ * But not the converse * If X, Y are indep, then $var(X+Y) = var(X) + var(Y)$ * Analogous to LOE * $var(X+a) = var(X)$ * If X, Y are not indep, $var(X_1 + X_2) = var(X_1) + var(X_2) + 2E[X_1X_2] - E[X_1]E[X_2]$ * $cov(X_1, X_2) = E[(X_1-E[X_1])(X_2 - E[X_2])]$ * $cov(X_1, X_2) = E[X_1X_2] - E[X_1]E[X_2]$ * $\rho = \frac{cov(X,Y)}{\sqrt{var(X)}\sqrt{var(Y)}}$ * How correlated they are: $-1 \leq \rho \leq 1$ ## Conditioning of RVs * $P_{X|Y}(k|n) = P(X=k | Y=n)$ * Conditional PMF of X given Y * X|Y is another RV ## Memoryless for Geometric * $P(X > s + t | X > s) = P(X > t)$ * No successes in first $s$ tries does not affect future tries * $P(X > s + t | X > s) = \frac{P(X>s+t)}{P(X>s)} = \frac{(1-p)^{s+t}}{(1-p)^s} = (1-p)^t = P(X > t)$ * Mean of geometric RV * $E[X] = E[X | X=1]P(X=1) + E[X | X>1]P(X>1) = 1(p)+(1+E[X])(1-p) = \frac{1}{p}$ # Lecture 5: 2/4/20 ## Conditional Expectation * Want to predict X given Y * $E[X|Y=y] = \sum x P(X=x | Y=y)$ * $E[X|Y=y]$ is expectation of X wrt dist of X conditioned on Y=y * $E[X|Y]$ is a RV which has value $E[X|Y=y]$ with prob $P(Y=y)$ * Ex. Roll a die N times, X is the sum of die rolls * $E[X|N=1] = \frac{7}{2}$, $E[X|N=2] = 7$ * $E[X|N] = \frac{7N}{2}$ * What is the expectation of the RV $E[X|Y]$ (function of Y) * Iterated expectation: $E[E[X|Y]] = E[X]$ ### Random number of trials * Roll a die N times, $N \sim Geom(p)$. X is the sum of all the rows * $E[X] = E[E[X|N]] = E[\frac{7N}{2}] = \frac{7}{2}E[N] = \frac{7}{2p}$ ### Random walk: drunken walk on a line * Start at the origin * $X_{n+1} = X_n + 1 + i_+ - i_-$ * Where $i_+$ is indicator for taking +1 step * $E[X_{n+1}] = E[X_n] + \frac{1}{2} - \frac{1}{2} = E[X_n] = 0$ (average at 0) * $P(X_{n+1}^2 = (k+1)^2 | X_n = k) = P(X_{n+1}^2 = (k-1)^2 | X_n = k) = \frac{1}{2}$ * $E[X_{n+1}^2 | X_n = k] = \frac{(k+1)^2 + (k-1)^2}{2} = k^2 + 1$ * $E[X_{n+1}^2 | X_n] = X_n^2 + 1$ * $E[X_{n+1}^2] = E[E[X_{n+1}^2 | X_n]] = E[X_n^2] + 1 = E[X_{n-1}^2] + 1 + 1 = \ldots = n+1$ * $E[X_0^2] = 0 \implies E[X_n^2] = n$ ## Continuous Probability * In most settings, continuous sample space is more natural * Let $X \sim U[0,1] \implies P(X=0.4) = 0$ * Need to define the probability over sets that have length * X is continuous RV if: * $\exists$ non-negative $f_x s.t. P(X \in B) = \int f_X(x) dx$ is well defined * $\int_{-\infty}^{\infty} f_X(x) dx= 1$ * PDF does not have to be between 0,1 but PMF does * $P(a \leq x \leq b) = \int_{a}^{b} f_X(x) dx$ * $P(X \in [x, x+ \epsilon]) = \int_{x}^{x + \epsilon} f_X(t) dt \approx f_X(x) \epsilon$ * $f_X(x) \approx \frac{P(X \in [x, x+ \epsilon]) }{\epsilon}$ * Ex. $f_x(x) = \frac{1}{2\sqrt{x}}$ for $0 < x < 1$, 0 everywhere else ## Cumulative Distribution Formula (CDF) * $F_X(x) = P(X \leq x)$ (for both cont. and discrete RVs) * Properties: * $F_X(\infty) = 1, F_X(-\infty) = 0$ * If X is discrete, $P(X=k) = F_X(k) - F_X(k-1)$ * If X is continuous, $f_x(x) = \frac{d F_X(x)}{dx}$ * Ex. Random Dart throw on unit circle * Find the pdf and cdf of Y * $P(Y \leq y) = ?$ * $F_Y(y) = y^2$ * $f_y(y) = \frac{d F_Y(y)}{dy} = 2y$ * $P(0.5 < y < 0.6) = F_y(0.6) - F_y(0.5) = 0.36 - 0.25 = 0.11$ * $E[G(x)] = \int\limits_{-\infty}^\infty g(x) f_X(x) dx$ * If X, Y are indep, $F_{xy}(x,y) = F_X(x)F_Y(y)$ ## Popular Continuous Distributions * Uniform $X \sim U[a,b]$ * $E[X] = \int\limits_a^b \frac{x}{b-a} dx = \frac{a+b}{2}$ * $Var(x) = E[X^2] - E[X]^2 = \frac{(b-a)^2}{12}$ * Exponential $f_X(x) = \lambda e^{-\lambda x}$ * $F_X(x) = P(X \leq x) = 1 - e^{-\lambda x}$ * Complementary CDF: $P(X > x) = e^{-\lambda x}$ * Memoryless property applies to expo($\lambda$) * $P(X > t+s | X>t) = P(X>s)$ if $s,t > 0$ # Lecture 6: 2/6/20 ## Normal Distribution * $f_X(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{-(x-\mu)^2 / 2 \sigma^2}$ * $F_X(x) = \phi(x)$ (standard normal dist.) * Show that it is a valid CDF - nonnegative and integrates to 1 * 1) Sum of 2 independent Normals is Normal * $X \sim N(\mu_1, \sigma_1^2), Y \sim N(\mu_2, \sigma_2^2)$ * $Z = X + Y, Z \sim N(\mu_1 + \mu_2, \sigma_1^2 + \sigma^2_2)$ * 2) A normal multiplied by a constant is Normal * $Y \sim N(a\mu, a^2 \sigma^2)$ * Properties allow us to convert any Normal into standard normal * $Z = \frac{X - \mu}{\sigma}$ ## Continuous Analogs * Joint PDFs $P(X,Y) \in A) \int\limits_A f_{X,Y} (x,y) dx dy$ * Nonnegative, integrates to 1 * Joint CDF: $F_{X,Y}(x,y) = P(X \leq x, Y \leq y)$ * $f_{X,Y}(x,y) = \frac{\partial^2}{\partial x \partial y} F_{X,Y}(x,y)$ * Marginal Probability Density * $f_X(x) = \int\limits_{-\infty}^{\infty} f_{X,Y}(x,y) dy$ * $f_X(x)$ is a density, not a probability * Conditional Probability Density * $f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}$ * Multiplication rule still holds * Independence * $f_{X,Y}(x,y) = f_X(x)f_Y(y)$ * $f_{X|Y}(x | y) = f_X(x)$ * $f_{Y|X}(y | x) = f_Y(y)$ * Bayes Rule * $f_{X|Y}(x|y) = \frac{f_{Y|X}(y|x)f_X(x)}{\int\limits_{-\infty}^{\infty} f_{Y|X}(y|t)f_X(t)dt}$ * Conditional Expectation * $E[Y | X=x] = \int\limits_{-\infty}^{\infty} yf_{Y|X}(y|x)dy$ ## Change of Variables / Derived Distributions * Let X be a discrete RV (coin toss), $Y=2X$. Compute distribution of Y * $P(Y=y) = P(f(X) = y) = P(X \in f^{-1}(y))$ * $X \sim U[0,1], Y = 2X$ * Use CDFs for continuous instead of PDFs * $F_Y(y) = P(Y \leq y) = F_X(\frac{y}{2}) = \frac{1}{2} f_X(\frac{y}{2})$ * $f_Y(y) \neq P(Y=y)$ * If g is monotonically increasing: * $P(g(X) \leq y) \ P(X \leq g^{-1}(y)) = F_X(g^{-1}(y))$ * If g is monotonically decreasing: * $P(g(X) \leq y) \ P(X \leq g^{-1}(y)) = 1 - F_X(g^{-1}(y))$ # Lecture 7: 2/11/20 ## Law of Total Variance * Recall law of iterated expectation: $E[E[X|Y]] = E[X]$ * $var(X|Y=y)$ is the variance of the conditional dist. $P(X=x | Y=y)$ * This is an RV, assumes $var(X|Y=y)$ with prob $P(Y=y)$ * LTV: $var(X) = E[var(X|Y)] + var(E[X|Y])$ ### Ex. Biased Coins * n tosses, X = # of heads, $Y = P(H) \sim U[0,1]$ * $var(X) = E[var(X|Y)] + var(E[X|Y])$ * $E[X] = \frac{n}{2}, E[X|Y] = nY$ * $= var(nY) + E[nY(1-Y)]$ * $=n^2 var(y) + nE[Y] - nE[Y^2]$ * $= \frac{n^2}{12} + \frac{n}{2} + n(\frac{1}{12} + \frac{1}{4})$ * $= \frac{n}{6} + \frac{n^2}{12}$ * Variance of a fair coin, n tosses is $\frac{n}{4}$ ## Order statistics * For rankings of items * Let X be a CRV for which $x_1 \ldots x_n$ are a random sample of size n * Reorder $x_i$ from smallest to largest * Ex. $X \sim U[0,1]$, $n=4, x_1 = 0.5, x_2 = 0.7, x_3 = 0.2, x_4 = 0.1$ * $x^{(1)} = 0.1, x^{(2)} = 0.2, x^{(3)} = 0.5, x^{(4)} = 0.7$ * RV: $X^{(i)} = x^{(i)}$ is ith order statistic * If X has pdf $f_X(x)$, the marginal pdf of ith order statistic is: * $f_{X^{(i)}}(y) = \frac{n!}{(i-1)!(n-i)!}(F_X(y))^{i-1}(1-F_X(y))^{n-i} f_X(y)$ * Special case: $X \sim U[0,1]$ * $f_{X^{(i)}}(y) = \frac{n!}{(i-1)!(n-i)!}(y)^{i-1}(1-y)^{n-i}$ * P(9th smallest of 10 drawings) from $X \sim U[0,1] > 0.8$ * Plug in $i=9, n=10$ * $f_{X^{(9)}}(x) = \frac{10!}{8!1!}(x)^{8}(1-y)$ * $P(X^{(9)} > 0.8) = \int\limits_{0.8}^{1} (90x^8 - 90x^9) dx$ ## Convolution * Let $Z = X + Y$ and X,Y are continuous indep * $f_Z(z) = \int\limits_x f_{x,z}(x,z) dx = \int\limits_x f_{x}(x)f_{z|x}(z|x) dx$ * $F_{Z|X}(z|x) = P(X+Y \leq z | X=x) = P(Y \leq z - x | X=x)$ * $= P(Y \leq z -x ) = F_Y(z-x)$ * Differentiate wrt z * $f_{Z|X}(z|x) = f_y(z-x)$ * Convolution: $f_Z(z) = \int\limits_x f_{x}(x)f_{Y}(z-x) dx$ * For discrete RVs: $P(Z=n) = \sum\limits_k P(X=k)P(Y=n-k)$ * Convolving 2 gaussians is another gaussian ## Moment Generating Functions (MGFs) * Use Taylor Series expansion $e^{sX} = 1 + sX + \frac{s^2X^2}{2!} + \frac{s^3X^3}{3!} + \ldots$ * $M_X{(s)} = E[e^{sX}] = 1 + sE[X] + \frac{s^2}{2!}E[x^2] + \frac{s^3}{3!}E[x^3] + \ldots$ * Transforms an RV to all the moments * $M_X(0) = 1$ * $\frac{d}{ds} M_X(s) = E[X] + sE[X^2] + \frac{s^2}{2!}E[X^3]$ * $M_X'(0) = E[X]$ * $\frac{d^2}{ds^2} M_X(s) = E[X^2] + sE[X^3]$ * $M_X''(0) = E[X]$ * $\frac{d^nM_x(s)}{ds^n} |_{s=0} = E[X^n]$ # Lecture 8: 2/13/20 ## Utility of MGFs * Finding moments easier * Convolution operation to multiplication * Great analytical tool (to prove CLT) * $M_X(s) = \int\limits_{-\infty}^{\infty} f_X(x)e^{sx}dx$ * Properties * $M_X(0) = 1$ * If $X > 0, M_X(-\infty) = 0$, if $X < 0, M_\infty(\infty) = 0$ * If $Y = aX + b$, $M_y(s) = E[e^{aX+b}] = e^{sb}E[e^{asX}] = e^{sb}M_X(as)$ * $M_Y(s) = e^{sb}M_X(as)$ * $X \sim Exp(1), f_X(x) = \lambda e^{-\lambda x}$ * $E[e^{sx}] = \int\limits_0^\infty \lambda e^{sx} - e^{-\lambda x} dx = \lambda \frac{e^{-(\lambda -s)x}}{-(\lambda - s)}|^\infty_0$ * $M_X(s) = \frac{\lambda}{\lambda - s}, s < \lambda$ * $E[X] = M_X'(0) = \frac{\lambda}{(\lambda -s)^2}|_{s=0} = \frac{1}{\lambda}$ * $E[X^2] = M_X'(0) = \frac{2\lambda}{(\lambda -s)^3}|_{s=0} = \frac{2}{\lambda^2}$ * $Var(x) = \frac{2}{\lambda^2} - (\frac{1}{\lambda})^2 = \frac{1}{\lambda^2}$ ### Discrete Analog * $P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}$ * $M_X(s) = \sum\limits_{k=0}^{\infty} e^{sk}e^{-\lambda}\frac{\lambda^k}{k!} = e^{-\lambda}\sum\limits_{k=0}^{\infty} \frac{(\lambda e^s)^k}{k!} = e^{-\lambda}e^{\lambda e^s}$ * $M_X(s) = e^{-\lambda + \lambda e^s}$ ### MGFs for Standard Normal * Given $X \sim \mathcal{N}(0,1) \rightarrow f_X(x) = \frac{1}{\sqrt{2\pi}}E^{-x^/2}$ * $M_X(s) = E[e^{sx}] = \int\limits_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-x^2/2}e^{sx} dx$ * $= \frac{1}{\sqrt{2\pi}} \int\limits_{-\infty}^{\infty} \exp(-(\frac{x^2 -2xs + s^2}{2})e^{s^2/2} dx$ * $= e^{s^2/2} \frac{1}{\sqrt{2}\pi}\int\limits_{-\infty}^{\infty}e^{\frac{-(x-s)^2}{2}} dx$ * $= e^{s^2 /2}$ * If $Y \sim \mathcal{N}(\mu, \sigma^2)$ * $Y = \sigma X + \mu$ * $M_y(s) = e^{s \mu} M_X(\sigma s)$ ### MGF sums * A given transform corresponds to a unique CDF * $M_X(s)$ is the laplace transform * MGF of a sum of independent RVs * What is the transform of $M_Z(s), Z = X+Y$ * $M_Z(s) = E[e^{s(X+Y)}] = E[e^{sX}e^{sY}] = M_X(s)M_Y(s)$ * If X,Y are 2 gaussians * $M_z(s) = M_X(s)M_Y(s) = e^{s^2/2}e^{s^2/2} = e^{s^2}$ * MGF($\mathcal{N}(0,2)$) ## Limit Behavior of RVs * Observe sequence of $X_1, X_2, \ldots X_n$ iid * Sample mean: $M_n = \frac{\sum\limits_{i=1}^n X_i}{n}$ * $E[M_n] = \frac{nE[X_i]}{n} = E[X_i]$ * Assuming $var(X_i) < \infty$, $var(M_n) = \frac{1}{n^2}(\sum\limits_{i=1}^{n} var(X_i)) = \frac{var(X_i)}{n}$ * As $n \to \infty, E[M_n] = E[X_i], var(M_n) = 0$ * What happens to the "deviation" - tail bounds ### Markov Bound * Weakest bound, no assumptions * For nonnegative RV X: $P(X > a) \leq \frac{E[X]}{a}$ * Indicator RV Y, 1 if $x > a$. $Y \leq \frac{x}{a}$ * Take expectation of both sides to get Markov's * Tail sum (continuous, $X \geq 0$): $E[X] = \int\limits_0^{\infty} P(X > t)dt$ * $dP(X>a) \leq E[X] \rightarrow P(X>a) \leq \frac{E[X]}{a}$ ### Chebyshev Bound * If X is an RV w/ finite mean, variance $\sigma^2$: * $P(|X - E[X]| \geq c) \leq \frac{\sigma^2}{c^2}$, $\forall c > 0$ * Special case: $c = k\sigma, P(|X-E[X]| \geq k\sigma) \leq \frac{1}{k^2}$ * $P(|X-E[X]| \geq c) = P((X-E[X])^2 \geq c) \leq \frac{E[(X-E[X])^2]}{c^2} = \frac{var(x)}{c^2}$ ### Chernoff Bound * Pick $X=e^{sY}, P(e^{sY} \geq k) \leq \frac{E[e^{sY}]}{a}$ * Let $k=e^{sb}$, $\forall s > 0, P(X>k) = P(e^{sX} > e^{sk}) \leq \frac{E[e^{sX}]}{e^{sk}}$ * $\forall s > 0$, $P(X \geq k) \leq e^{-sk}M_X(s))$ * $P(X \geq k) \leq \min\limits_{s > 0} [e^{-sb}M_X(s)]$ * For $s < 0$, $P(X \geq k) \leq \min\limits_{s > 0} [e^{-sb}M_X(s)]$ ### Chernoff Example * $X \sim \mathcal{N}(0,1)$ * $P(X \leq k) = \min\limits_{s > 0} [e^{-sb}M_X(s)] = \min\limits_{s > 0} [e^{-sb+s^2/2}]$ * Optimize using calculus * $f(s) = e^{-sk + s^2/2}$ * $f'(s) = e^{-sk+s^2/2}(-k+s)$ * Min: $s^* = k$ * $P(X \geq k) \leq e^{-k^s + k^2/2} = e^{-k^2/2}$