# EE 126 MT1 Notes
* MT1 Notes: https://hackmd.io/@matthewtang/126mt1
* MT2 Notes: https://hackmd.io/@matthewtang/126mt2
# Lecture 2: 1/23/20
### Birthday paradox (Complement)
* Probability that at least 2ppl in a group share the same bday
* Assume 365 days in a year (=k)
* $|\Omega| = k^n = 365^n$
* A = "At least 2 people have the same bday"
* $A^C$ = "No 2 people share a bday" (easier to compute)
* $P(A^C) = \frac{|A^C|}{|\Omega|} = \frac{(365)(364)\ldots(365-(n-1))}{365^n}$
* $P(A^C) = 1(1 - \frac{1}{k})(1 - \frac{2}{k}) \ldots (1 - \frac{n-1}{k}))$
* Use Taylor series: $e^x \approx 1 + x$ for $|x| << 1$ (Assume there are 30-40 people)
* $\approx 1 e^{-\frac{1}{k}} e^{-\frac{2}{k}} \ldots$
* $= e^{-\frac{1}{k}(1 + 2 + 3 + \ldots (n-1))}$ $\boxed{\approx e^{- \frac{n^2}{2k}}}$
* For $k=365, n=23 \rightarrow P(\geq 1 pair) \approx 50.4\%$
* For $n=50 \rightarrow \approx 97\%$
### False Positive Quiz (Bayes rule)
* Test for a rare disease
* If person has disease, P(test+) = .95
* If person does not have disease, P(test-) = .95
* P(person has disease) = .001
* Want P(disease+ | test+)
* Let A = disease+
* Let B = test+
* $P(A|B) = \frac{P(B|A)P(A)}{P(A)P(B|A) + P(A^C)P(B|A^C)}$
* $= \frac{(.95)(.001)}{(.95)(.001)+(.999)(.005)} = .0187$
* If $P(A) = .01 \rightarrow P(A|B) = .16$
* If $P(A) = .5 \rightarrow P(A|B) = .95$
## Independence
* 2 events are indep if occurrence of 1 does not affect the other
* $\boxed{P(A|B) = P(A)}$
* $\boxed{P(A \cap B) = P(A)P(B)}$
* A & B are disjoint iff $P(A \cap B) = 0 \implies P(A) = 0 or P(B) = 0$
* A or B must have 0 probability if A are both independent and disjoint
* Note: Base outcomes of a random experiment are disjoint and have non-zero prob $\implies$ always dependent
## Conditional Independence
* $\boxed{P(A \cap B | C) =P(A|C) P(B|C)}$
* Dep events can be cond indep
* Indep events can be cond dependent
### Unfair coins
* 2 Indistinguishable coins: 1 is 2 tailed, 1 is 2 headed
* You pick 1 of the 2 coins at random and flip it twice
* $H_i$: ith flip is a head (i=1,2)
* $H_1, H_2$ are not independent
* $P(H_2 | H_1) = 1 \neq P(H_1)P(H_2) = \frac{1}{4}$
* Let A = pick 2-headed coin
* $P(H_1, H_2 | A) = P(H_1|A)P(H_2|A) \implies $ cond indep
## Collective independence
* $P(\cap_{i \in S} A_i) = \prod\limits_{i \in S} P(A_i)$
* S is any subset of the collection of events
* Pairwise indep does not imply joint indep of 3+ events
### Die rolling
* Roll 2 fair dice, what is the prob of rolling a 6 before a 7
* Use independence. E = roll 6 before a 7
* Condition on first roll. Let S be event that first roll is a 6
* Let T = event that the first roll is a 7
* $P(E) = P(E|S)P(S) + P(E|T)P(T) + P(E|(S\cup T)^C)P((S\cup T)^C)$
* $P(E) = 1(\frac{5}{36})+ 0 + P(E)(\frac{25}{36})$
* $P(E) = \frac{5}{11}$
## Random Variables
* RVs associate a real number with each event
* Ex. $X^2$
* Assigning real numbers allows us to do statistics
* RV is a function from $\Omega \rightarrow \mathbb{R}$
* Ex. 2 4 sided dies
* $M_k$ is the event that the min of the 4 sided die is k
* M is the RV that is equal to the value of the minimum
* If you roll (2,3), M=2, $M_2$ is the event that the min is 2
* Form a PMF (for discrete RVs, density for continuous)
* Each base outcome of experiment maps to 1 RV value
* $P_X(x)=P(X=x)$
* Normalization: $\sum\limits_x P_X(x) = 1$
### Chess match
* A and K play first to win
* The match is drawn if 10 consecutive draws
* $P(A+) = .3$, $P(K+) = .4$, $P(draw) = 0.3$
* Goal: find PMF of match duration (L)
* $P_L(l) = \begin{cases} 0.3^9 & l=10\\ .7(0.3)^{l-1} & 1 \leq l \leq 9\end{cases}$
* $P(A+) = (0.3) \sum\limits_{l=0}^9 (0.3)^l = (1-0.3^{10}) \frac{.3}{.7}$
# Lecture 3: 1/28/20
## DRVs
* Distribution/PMF is collection of all values $\{ a, P_x(a), a \in \mathcal{A}\}$ where $\mathcal{A}$ is the set of all possible values taken by X
* Functions of RVs are RVs
* $Y=g(X), P_Y(y) = \sum\limits_{\{x | g(x) = y\}} P_X(x)$
* Ex. $Y=|X|$, X is uniform between -2, 2
* Y is always positive, so either 0 (1/5), 1 (2/5), 2 (2/5)
## Expectation
* $E[X] = \sum\limits_{x \in \mathcal{X}} x P(X=x) = \sum\limits_{x \in \mathcal{X}} xP_X(x)$
* Sum of all values that X can take weighted by their mass
* Alternately: $E[X] = \sum\limits_{w \in \Omega} X(w)P(w)$
* Linearity of Expectation(LOE): For any 2 RVs X, Y that are defined on the same probability space $E[X+Y] = E[X] + E[Y]$
* $E[cX] = cE[x]$
* No independence assumptions made
* Ex. Average of sum of 2 rolls
* $E[X] = E[X_1 + X_2] = E[X_1] + E[X_2] = 7$
* Ex. Find the avg num of students who get back their hw
* Let $X_i$ be an indicator RV for the ith student gets their hw (1 if get HW back, 0 if not)
* Let X be the num of students who got back their own HWs
* $E[X] = \sum\limits_{i=1}^N E[X_i] = 1$
## Variance
* Var(X) = $\sigma_X^2 = E[(X - E[X])^2] = E[X^2] - E[X]^2$
* Stdev = $\sigma_X = \sqrt{\text{Var}(X)}$
* Tail Sum Formula (TSF): If X is a nonnegative DRV, $E[X] = \sum\limits_{k=0}^\infty P(X \geq k)$
* If $E[X] = \sum\limits_{i=1}^\infty i P(X = i) = \sum\limits_{k=0}^\infty \sum\limits_{j=1}^i P(X=i)$
* $\sum\limits_{j=1}^\infty \sum\limits_{k=0}^\infty P(X=i) = \sum\limits_{j=1}^\infty P(X \geq j)$
## Common Random Variables
* Discrete uniform distribution: $U[a, a+1, \ldots b]$
$P_X(k) = \begin{cases} \frac{1}{b-a+1} & \text{if } k = a, a+1 , \ldots b\\
0 & \text{otherwise} \end{cases}$
$E[X] = \frac{a+b}{2}$, $var(X) = \frac{n^2-1}{12}$
* Bernoulli RV: RV with 2 outcomes. Parameters: $p$
$P_X(k) = \begin{cases} p & \text{if } k = 0 \\
1-p & \text{if } k = 1 \end{cases}$
$E[X] = p$, $var(X) = p(1-p)$
* Indicator RV: A bridge between events and RVs
$X = \{1\}_A = \begin{cases} 1 & \text{If A is true}\\ 0 & \text{else} \end{cases}$
$E[X] = P(A)$
* Binomial RV: $n$ independent Bernoulli RVs. Parameters:$p, n$
$P_X(k) = \binom{n}{k} p^k (1-p)^{n-k}$
$E[X] = np$, $var(X) = np(1-p)$
* Geometric RV: How long you have to wait before an event happens.
$P_X(k) = (1-p)^{k-1} p$ for $k = 1,2,3, \ldots$
$E[X] = \frac{1}{p}$, $var(X) = \frac{(1-p)}{p^2}$
* Poisson RV: Like a binomial RV with $p \to 0, n \to \infty, np = \lambda$
$P_X(k) = e^{-\lambda}\frac{\lambda^k}{k!}$ for $k = 0, 1, 2, \ldots$
$E[X] = \lambda$, $var(X) = \lambda$
* $1-p \approx e^{-p}$
* If $X \sim Poiss(\lambda_x), Y \sim Poiss(\lambda_y), X \& Y$ are indep, then $Z = X + Y \sim Poiss(\lambda_x + \lambda_y)$
* Merging: 2 queues of packets added
* Splitting: 1 stream split into 2 queues(p, 1-p) with parameters $p\lambda_x, (1-p)\lambda_x$
# Lecture 4: 1/30/20
### St. Peterburg Paradox
* Toss a fair coin until you get a head
* If you take n tosses, you get $2^n$
* How much should you pay to play this game?
* $E[W] = \sum\limits^\infty_{k=0} 2^k(\frac{1}{2^k}) = 1 + \ldots 1 = \infty$
* Your utility function $U(w) \approx \log(W)$
## Properties
* If X, Y are indep, $E[X,Y] = E[X]E[Y]$
* But not the converse
* If X, Y are indep, then $var(X+Y) = var(X) + var(Y)$
* Analogous to LOE
* $var(X+a) = var(X)$
* If X, Y are not indep, $var(X_1 + X_2) = var(X_1) + var(X_2) + 2E[X_1X_2] - E[X_1]E[X_2]$
* $cov(X_1, X_2) = E[(X_1-E[X_1])(X_2 - E[X_2])]$
* $cov(X_1, X_2) = E[X_1X_2] - E[X_1]E[X_2]$
* $\rho = \frac{cov(X,Y)}{\sqrt{var(X)}\sqrt{var(Y)}}$
* How correlated they are: $-1 \leq \rho \leq 1$
## Conditioning of RVs
* $P_{X|Y}(k|n) = P(X=k | Y=n)$
* Conditional PMF of X given Y
* X|Y is another RV
## Memoryless for Geometric
* $P(X > s + t | X > s) = P(X > t)$
* No successes in first $s$ tries does not affect future tries
* $P(X > s + t | X > s) = \frac{P(X>s+t)}{P(X>s)} = \frac{(1-p)^{s+t}}{(1-p)^s} = (1-p)^t = P(X > t)$
* Mean of geometric RV
* $E[X] = E[X | X=1]P(X=1) + E[X | X>1]P(X>1) = 1(p)+(1+E[X])(1-p) = \frac{1}{p}$
# Lecture 5: 2/4/20
## Conditional Expectation
* Want to predict X given Y
* $E[X|Y=y] = \sum x P(X=x | Y=y)$
* $E[X|Y=y]$ is expectation of X wrt dist of X conditioned on Y=y
* $E[X|Y]$ is a RV which has value $E[X|Y=y]$ with prob $P(Y=y)$
* Ex. Roll a die N times, X is the sum of die rolls
* $E[X|N=1] = \frac{7}{2}$, $E[X|N=2] = 7$
* $E[X|N] = \frac{7N}{2}$
* What is the expectation of the RV $E[X|Y]$ (function of Y)
* Iterated expectation: $E[E[X|Y]] = E[X]$
### Random number of trials
* Roll a die N times, $N \sim Geom(p)$. X is the sum of all the rows
* $E[X] = E[E[X|N]] = E[\frac{7N}{2}] = \frac{7}{2}E[N] = \frac{7}{2p}$
### Random walk: drunken walk on a line
* Start at the origin
* $X_{n+1} = X_n + 1 + i_+ - i_-$
* Where $i_+$ is indicator for taking +1 step
* $E[X_{n+1}] = E[X_n] + \frac{1}{2} - \frac{1}{2} = E[X_n] = 0$ (average at 0)
* $P(X_{n+1}^2 = (k+1)^2 | X_n = k) = P(X_{n+1}^2 = (k-1)^2 | X_n = k) = \frac{1}{2}$
* $E[X_{n+1}^2 | X_n = k] = \frac{(k+1)^2 + (k-1)^2}{2} = k^2 + 1$
* $E[X_{n+1}^2 | X_n] = X_n^2 + 1$
* $E[X_{n+1}^2] = E[E[X_{n+1}^2 | X_n]] = E[X_n^2] + 1 = E[X_{n-1}^2] + 1 + 1 = \ldots = n+1$
* $E[X_0^2] = 0 \implies E[X_n^2] = n$
## Continuous Probability
* In most settings, continuous sample space is more natural
* Let $X \sim U[0,1] \implies P(X=0.4) = 0$
* Need to define the probability over sets that have length
* X is continuous RV if:
* $\exists$ non-negative $f_x s.t. P(X \in B) = \int f_X(x) dx$ is well defined
* $\int_{-\infty}^{\infty} f_X(x) dx= 1$
* PDF does not have to be between 0,1 but PMF does
* $P(a \leq x \leq b) = \int_{a}^{b} f_X(x) dx$
* $P(X \in [x, x+ \epsilon]) = \int_{x}^{x + \epsilon} f_X(t) dt \approx f_X(x) \epsilon$
* $f_X(x) \approx \frac{P(X \in [x, x+ \epsilon]) }{\epsilon}$
* Ex. $f_x(x) = \frac{1}{2\sqrt{x}}$ for $0 < x < 1$, 0 everywhere else
## Cumulative Distribution Formula (CDF)
* $F_X(x) = P(X \leq x)$ (for both cont. and discrete RVs)
* Properties:
* $F_X(\infty) = 1, F_X(-\infty) = 0$
* If X is discrete, $P(X=k) = F_X(k) - F_X(k-1)$
* If X is continuous, $f_x(x) = \frac{d F_X(x)}{dx}$
* Ex. Random Dart throw on unit circle
* Find the pdf and cdf of Y
* $P(Y \leq y) = ?$
* $F_Y(y) = y^2$
* $f_y(y) = \frac{d F_Y(y)}{dy} = 2y$
* $P(0.5 < y < 0.6) = F_y(0.6) - F_y(0.5) = 0.36 - 0.25 = 0.11$
* $E[G(x)] = \int\limits_{-\infty}^\infty g(x) f_X(x) dx$
* If X, Y are indep, $F_{xy}(x,y) = F_X(x)F_Y(y)$
## Popular Continuous Distributions
* Uniform $X \sim U[a,b]$
* $E[X] = \int\limits_a^b \frac{x}{b-a} dx = \frac{a+b}{2}$
* $Var(x) = E[X^2] - E[X]^2 = \frac{(b-a)^2}{12}$
* Exponential $f_X(x) = \lambda e^{-\lambda x}$
* $F_X(x) = P(X \leq x) = 1 - e^{-\lambda x}$
* Complementary CDF: $P(X > x) = e^{-\lambda x}$
* Memoryless property applies to expo($\lambda$)
* $P(X > t+s | X>t) = P(X>s)$ if $s,t > 0$
# Lecture 6: 2/6/20
## Normal Distribution
* $f_X(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{-(x-\mu)^2 / 2 \sigma^2}$
* $F_X(x) = \phi(x)$ (standard normal dist.)
* Show that it is a valid CDF - nonnegative and integrates to 1
* 1) Sum of 2 independent Normals is Normal
* $X \sim N(\mu_1, \sigma_1^2), Y \sim N(\mu_2, \sigma_2^2)$
* $Z = X + Y, Z \sim N(\mu_1 + \mu_2, \sigma_1^2 + \sigma^2_2)$
* 2) A normal multiplied by a constant is Normal
* $Y \sim N(a\mu, a^2 \sigma^2)$
* Properties allow us to convert any Normal into standard normal
* $Z = \frac{X - \mu}{\sigma}$
## Continuous Analogs
* Joint PDFs $P(X,Y) \in A) \int\limits_A f_{X,Y} (x,y) dx dy$
* Nonnegative, integrates to 1
* Joint CDF: $F_{X,Y}(x,y) = P(X \leq x, Y \leq y)$
* $f_{X,Y}(x,y) = \frac{\partial^2}{\partial x \partial y} F_{X,Y}(x,y)$
* Marginal Probability Density
* $f_X(x) = \int\limits_{-\infty}^{\infty} f_{X,Y}(x,y) dy$
* $f_X(x)$ is a density, not a probability
* Conditional Probability Density
* $f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}$
* Multiplication rule still holds
* Independence
* $f_{X,Y}(x,y) = f_X(x)f_Y(y)$
* $f_{X|Y}(x | y) = f_X(x)$
* $f_{Y|X}(y | x) = f_Y(y)$
* Bayes Rule
* $f_{X|Y}(x|y) = \frac{f_{Y|X}(y|x)f_X(x)}{\int\limits_{-\infty}^{\infty} f_{Y|X}(y|t)f_X(t)dt}$
* Conditional Expectation
* $E[Y | X=x] = \int\limits_{-\infty}^{\infty} yf_{Y|X}(y|x)dy$
## Change of Variables / Derived Distributions
* Let X be a discrete RV (coin toss), $Y=2X$. Compute distribution of Y
* $P(Y=y) = P(f(X) = y) = P(X \in f^{-1}(y))$
* $X \sim U[0,1], Y = 2X$
* Use CDFs for continuous instead of PDFs
* $F_Y(y) = P(Y \leq y) = F_X(\frac{y}{2}) = \frac{1}{2} f_X(\frac{y}{2})$
* $f_Y(y) \neq P(Y=y)$
* If g is monotonically increasing:
* $P(g(X) \leq y) \ P(X \leq g^{-1}(y)) = F_X(g^{-1}(y))$
* If g is monotonically decreasing:
* $P(g(X) \leq y) \ P(X \leq g^{-1}(y)) = 1 - F_X(g^{-1}(y))$
# Lecture 7: 2/11/20
## Law of Total Variance
* Recall law of iterated expectation: $E[E[X|Y]] = E[X]$
* $var(X|Y=y)$ is the variance of the conditional dist. $P(X=x | Y=y)$
* This is an RV, assumes $var(X|Y=y)$ with prob $P(Y=y)$
* LTV: $var(X) = E[var(X|Y)] + var(E[X|Y])$
### Ex. Biased Coins
* n tosses, X = # of heads, $Y = P(H) \sim U[0,1]$
* $var(X) = E[var(X|Y)] + var(E[X|Y])$
* $E[X] = \frac{n}{2}, E[X|Y] = nY$
* $= var(nY) + E[nY(1-Y)]$
* $=n^2 var(y) + nE[Y] - nE[Y^2]$
* $= \frac{n^2}{12} + \frac{n}{2} + n(\frac{1}{12} + \frac{1}{4})$
* $= \frac{n}{6} + \frac{n^2}{12}$
* Variance of a fair coin, n tosses is $\frac{n}{4}$
## Order statistics
* For rankings of items
* Let X be a CRV for which $x_1 \ldots x_n$ are a random sample of size n
* Reorder $x_i$ from smallest to largest
* Ex. $X \sim U[0,1]$, $n=4, x_1 = 0.5, x_2 = 0.7, x_3 = 0.2, x_4 = 0.1$
* $x^{(1)} = 0.1, x^{(2)} = 0.2, x^{(3)} = 0.5, x^{(4)} = 0.7$
* RV: $X^{(i)} = x^{(i)}$ is ith order statistic
* If X has pdf $f_X(x)$, the marginal pdf of ith order statistic is:
* $f_{X^{(i)}}(y) = \frac{n!}{(i-1)!(n-i)!}(F_X(y))^{i-1}(1-F_X(y))^{n-i} f_X(y)$
* Special case: $X \sim U[0,1]$
* $f_{X^{(i)}}(y) = \frac{n!}{(i-1)!(n-i)!}(y)^{i-1}(1-y)^{n-i}$
* P(9th smallest of 10 drawings) from $X \sim U[0,1] > 0.8$
* Plug in $i=9, n=10$
* $f_{X^{(9)}}(x) = \frac{10!}{8!1!}(x)^{8}(1-y)$
* $P(X^{(9)} > 0.8) = \int\limits_{0.8}^{1} (90x^8 - 90x^9) dx$
## Convolution
* Let $Z = X + Y$ and X,Y are continuous indep
* $f_Z(z) = \int\limits_x f_{x,z}(x,z) dx = \int\limits_x f_{x}(x)f_{z|x}(z|x) dx$
* $F_{Z|X}(z|x) = P(X+Y \leq z | X=x) = P(Y \leq z - x | X=x)$
* $= P(Y \leq z -x ) = F_Y(z-x)$
* Differentiate wrt z
* $f_{Z|X}(z|x) = f_y(z-x)$
* Convolution: $f_Z(z) = \int\limits_x f_{x}(x)f_{Y}(z-x) dx$
* For discrete RVs: $P(Z=n) = \sum\limits_k P(X=k)P(Y=n-k)$
* Convolving 2 gaussians is another gaussian
## Moment Generating Functions (MGFs)
* Use Taylor Series expansion $e^{sX} = 1 + sX + \frac{s^2X^2}{2!} + \frac{s^3X^3}{3!} + \ldots$
* $M_X{(s)} = E[e^{sX}] = 1 + sE[X] + \frac{s^2}{2!}E[x^2] + \frac{s^3}{3!}E[x^3] + \ldots$
* Transforms an RV to all the moments
* $M_X(0) = 1$
* $\frac{d}{ds} M_X(s) = E[X] + sE[X^2] + \frac{s^2}{2!}E[X^3]$
* $M_X'(0) = E[X]$
* $\frac{d^2}{ds^2} M_X(s) = E[X^2] + sE[X^3]$
* $M_X''(0) = E[X]$
* $\frac{d^nM_x(s)}{ds^n} |_{s=0} = E[X^n]$
# Lecture 8: 2/13/20
## Utility of MGFs
* Finding moments easier
* Convolution operation to multiplication
* Great analytical tool (to prove CLT)
* $M_X(s) = \int\limits_{-\infty}^{\infty} f_X(x)e^{sx}dx$
* Properties
* $M_X(0) = 1$
* If $X > 0, M_X(-\infty) = 0$, if $X < 0, M_\infty(\infty) = 0$
* If $Y = aX + b$, $M_y(s) = E[e^{aX+b}] = e^{sb}E[e^{asX}] = e^{sb}M_X(as)$
* $M_Y(s) = e^{sb}M_X(as)$
* $X \sim Exp(1), f_X(x) = \lambda e^{-\lambda x}$
* $E[e^{sx}] = \int\limits_0^\infty \lambda e^{sx} - e^{-\lambda x} dx = \lambda \frac{e^{-(\lambda -s)x}}{-(\lambda - s)}|^\infty_0$
* $M_X(s) = \frac{\lambda}{\lambda - s}, s < \lambda$
* $E[X] = M_X'(0) = \frac{\lambda}{(\lambda -s)^2}|_{s=0} = \frac{1}{\lambda}$
* $E[X^2] = M_X'(0) = \frac{2\lambda}{(\lambda -s)^3}|_{s=0} = \frac{2}{\lambda^2}$
* $Var(x) = \frac{2}{\lambda^2} - (\frac{1}{\lambda})^2 = \frac{1}{\lambda^2}$
### Discrete Analog
* $P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}$
* $M_X(s) = \sum\limits_{k=0}^{\infty} e^{sk}e^{-\lambda}\frac{\lambda^k}{k!} = e^{-\lambda}\sum\limits_{k=0}^{\infty} \frac{(\lambda e^s)^k}{k!} = e^{-\lambda}e^{\lambda e^s}$
* $M_X(s) = e^{-\lambda + \lambda e^s}$
### MGFs for Standard Normal
* Given $X \sim \mathcal{N}(0,1) \rightarrow f_X(x) = \frac{1}{\sqrt{2\pi}}E^{-x^/2}$
* $M_X(s) = E[e^{sx}] = \int\limits_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-x^2/2}e^{sx} dx$
* $= \frac{1}{\sqrt{2\pi}} \int\limits_{-\infty}^{\infty} \exp(-(\frac{x^2 -2xs + s^2}{2})e^{s^2/2} dx$
* $= e^{s^2/2} \frac{1}{\sqrt{2}\pi}\int\limits_{-\infty}^{\infty}e^{\frac{-(x-s)^2}{2}} dx$
* $= e^{s^2 /2}$
* If $Y \sim \mathcal{N}(\mu, \sigma^2)$
* $Y = \sigma X + \mu$
* $M_y(s) = e^{s \mu} M_X(\sigma s)$
### MGF sums
* A given transform corresponds to a unique CDF
* $M_X(s)$ is the laplace transform
* MGF of a sum of independent RVs
* What is the transform of $M_Z(s), Z = X+Y$
* $M_Z(s) = E[e^{s(X+Y)}] = E[e^{sX}e^{sY}] = M_X(s)M_Y(s)$
* If X,Y are 2 gaussians
* $M_z(s) = M_X(s)M_Y(s) = e^{s^2/2}e^{s^2/2} = e^{s^2}$
* MGF($\mathcal{N}(0,2)$)
## Limit Behavior of RVs
* Observe sequence of $X_1, X_2, \ldots X_n$ iid
* Sample mean: $M_n = \frac{\sum\limits_{i=1}^n X_i}{n}$
* $E[M_n] = \frac{nE[X_i]}{n} = E[X_i]$
* Assuming $var(X_i) < \infty$, $var(M_n) = \frac{1}{n^2}(\sum\limits_{i=1}^{n} var(X_i)) = \frac{var(X_i)}{n}$
* As $n \to \infty, E[M_n] = E[X_i], var(M_n) = 0$
* What happens to the "deviation" - tail bounds
### Markov Bound
* Weakest bound, no assumptions
* For nonnegative RV X: $P(X > a) \leq \frac{E[X]}{a}$
* Indicator RV Y, 1 if $x > a$. $Y \leq \frac{x}{a}$
* Take expectation of both sides to get Markov's
* Tail sum (continuous, $X \geq 0$): $E[X] = \int\limits_0^{\infty} P(X > t)dt$
* $dP(X>a) \leq E[X] \rightarrow P(X>a) \leq \frac{E[X]}{a}$
### Chebyshev Bound
* If X is an RV w/ finite mean, variance $\sigma^2$:
* $P(|X - E[X]| \geq c) \leq \frac{\sigma^2}{c^2}$, $\forall c > 0$
* Special case: $c = k\sigma, P(|X-E[X]| \geq k\sigma) \leq \frac{1}{k^2}$
* $P(|X-E[X]| \geq c) = P((X-E[X])^2 \geq c) \leq \frac{E[(X-E[X])^2]}{c^2} = \frac{var(x)}{c^2}$
### Chernoff Bound
* Pick $X=e^{sY}, P(e^{sY} \geq k) \leq \frac{E[e^{sY}]}{a}$
* Let $k=e^{sb}$, $\forall s > 0, P(X>k) = P(e^{sX} > e^{sk}) \leq \frac{E[e^{sX}]}{e^{sk}}$
* $\forall s > 0$, $P(X \geq k) \leq e^{-sk}M_X(s))$
* $P(X \geq k) \leq \min\limits_{s > 0} [e^{-sb}M_X(s)]$
* For $s < 0$, $P(X \geq k) \leq \min\limits_{s > 0} [e^{-sb}M_X(s)]$
### Chernoff Example
* $X \sim \mathcal{N}(0,1)$
* $P(X \leq k) = \min\limits_{s > 0} [e^{-sb}M_X(s)] = \min\limits_{s > 0} [e^{-sb+s^2/2}]$
* Optimize using calculus
* $f(s) = e^{-sk + s^2/2}$
* $f'(s) = e^{-sk+s^2/2}(-k+s)$
* Min: $s^* = k$
* $P(X \geq k) \leq e^{-k^s + k^2/2} = e^{-k^2/2}$