--- title: Concentration Inequalities tags: cs 593 rl --- Let $X_1, X_2 ... X_n$ be $n$ independent random variables $X = \frac{\sum_i X_i}{n}, Var[X] = \frac{\sum_i \sigma_i^2}{n^2}, E[X] = \frac{\sum_i E[X_i]}{n}$ ### Markov Inequality Consider a random variable $X: \Omega \to R^+$ on a probability space $(\omega, \mathcal{F}, P)$, then: $\mathcal{P}(X > \epsilon) \le \frac{E[X]}{\epsilon} \; \forall \; \epsilon > 0$ Proof: $E[X] = \int_\Omega XdP = \int_\Omega XI(X \leq \epsilon)dP + \int_\Omega X I(X > \epsilon)dP$ $\geq \int_\Omega XI(X > \epsilon)dP \geq \int_\Omega \epsilon I(X > \epsilon)dP = \epsilon P(X > \epsilon)$ $\therefore P(X > \epsilon) \leq \frac{E[X]}{\epsilon}$ ### Chebyshev Inequality Consider a random variable $X: \Omega \to R^+$ on a probability space $(\omega, \mathcal{F}, P)$, then for any $\epsilon > 0$ and $k > 0$ such that $|X - E[X]|^k$ is integrable: $P(|X - E[X]| > \epsilon) \leq \frac{1}{\epsilon^k}E[|X - E[X]|^k]$ For $k = 2$, $P(|X - E[X]| > \epsilon) \leq \frac{1}{\epsilon^2}Var(X)$ Proof: Apply the Markov inequality to the random variable $|X - E[X]|^k$: $E[|X - E[X]|^k] = \int_\Omega |X - E[X]|^kdP \geq \epsilon^k \int_\Omega I(|X - E[X] > \epsilon) \geq \epsilon^k P(|X - E[X]|)$ ### Sub-Gaussian Random Variables A random variable $X$ on $\mathcal{(\Omega, F, P)}$ is a sub-Gaussian if there is $R \in Z^+$ such that $E[e^{\lambda(X - E[X])}] \le e^{R\lambda^2/2}$, $R$ = sub-Gaussian constant $P(X - E[X] \ge \epsilon) = P(e^{\lambda(X - E[X])} \ge e^{\lambda\epsilon})$ $\le E[e^{\lambda(X - E[X])}]e^{-\lambda\epsilon} \le e^{R\lambda^2/2 - \lambda\epsilon} \le e^{-\epsilon^2/2R}$ Similarly, other side: $P(X - E[X] \le -\epsilon) \le e^{-\epsilon^2/2R}$ Thus, for sub-Gaussian random variables, we have: $P(|X - E[X]| \gt \epsilon) \le 2e^{-\epsilon^2/2R}$ ### Hoeffding Inequality Let $X_1, X_2, ... X_n$ be $n$ independent random variables on $\mathcal{(\Omega, F, P)}$ which are sub-Gaussian with constants $R_1, R_2 ... R_n$ then: $\mathcal{P}(|\sum_i (X_i - E[X_i])| > \epsilon) \le 2e^{-\epsilon^2/2\sum R_i}$ Proof: $E[\exp(\lambda \sum_i (X_i - E[X_i]))] = E[\Pi_i \exp( \lambda (X_i - E[X_i]))]$ $= \Pi_i E[\exp( \lambda (X_i - E[X_i]))] \leq \Pi_i e^{R_i \lambda^2 / 2} \qquad \textrm{(Independence, sub-Gaussianity)}$ The rest of the proof follows from the above for sub-Gaussian random variables. Alternate form: Let $\delta = 2e^{-\epsilon^2/2\sum R_i}, \epsilon = \sqrt{2\sum R_i \log (2/\delta)}$ $P(|\sum_i(X_i - E[X_i])| \leq \epsilon) > 1 - \delta$ If $R_i = R \; \forall \; i$ and i.i.d $X_i$, $P\bigg(|\frac{\sum_i X_i}{n} - E[X]| \leq \sqrt{\frac{2R}{n} \log (2/\delta)}\bigg) > 1 - \delta$ ### Bernstein Inequality Let $X$ be an RV on $\mathcal{(\Omega, F, P)}$ such that $|X - E[X]| < b$ and $\text{Var}[X] \le R$. Such a variable is known as **sub-exponential**, with: $P(|X - E[X]| > \epsilon) \le 2e^{-\frac{\epsilon^2}{2(R + b\epsilon/3)}}$ Proof: High-Dimensional Statistics, Section 2.1.3 ### Martingales Let $X_1, X_2, ...$ be a sequence of RVs on $\mathcal{\Omega, F, P}$ and $\mathcal{F} = \{F_t\}$ such that $F_t \subset F_{t+1}$ is a filtration. An $\mathcal{F}$-adapted sequence is a martingale sequence if $E[X_t|F_{t-1}] = X_{t-1}$ and $X_t$ is integrable. ### Maximal Inequality Let $\{X_t\}$ be a martingale sequence on $\mathcal{\Omega, F, P}$ with $X_t \ge 0$ then: $P(\sup X_t > \epsilon) \le \frac{E[X_0]}{\epsilon}$ Proof: *Missing notes* ### Martingale Difference A sequence of random variables $Y_t$ on $\mathcal{\Omega, F, P}$ with $\mathcal{F} = \{F_t\}$. $\mathcal{F}$ is a martingale difference if: $E[Y_t|F_{t-1}] = 0$ and $Y_t$ is integrable i.e. $Y_t$ is the difference between consecutive elements $X_t$ of a martingale sequence ### Azuma Inequality Let $\{Y_t\}$ be a martingale difference on $\mathcal{\Omega, F, P}$ adapted to $\{F_t\}$ such that a version of $E[e^{\lambda Y_{t+1}}|F_t] \le e^{R\lambda^2/2} \; \forall \; \lambda$ (sub-Gaussian), then: $P(|\sum Y_t| > \epsilon) \le 2e^{-\epsilon^2/2nR}$ Proof: $P(\sum Y_t > \epsilon) = P(e^{\lambda\sum Y_t} > e^{\lambda\epsilon})$ $\le E[\exp(\lambda \sum_{t=1}^{n} Y_t)]e^{-\lambda\epsilon} = E[E[e^{\lambda \sum Y_t}|F_{t-1}]]e^{-\lambda\epsilon}$ $= E[\exp(\lambda \sum_{t = 1}^{n - 1} Y_t)E[\exp(\lambda Y_t) | F_{t - 1}]]e^{-\lambda\epsilon}$ $\le e^{-\lambda\epsilon}$ ... $\le E[e^{nR\lambda^2/2}e^{-\lambda\epsilon} \le 2e^{-\epsilon^2/2nR}$ ### Freedman Inequality Let $\{Y_t\}$ be a martingale difference on $\mathcal{\Omega, F, P}$ adapted to $\{F_t\}$ such that $|Y_t| \le b$, then: $P(|X - E[X]| > \epsilon) \le 2e^{-\frac{\epsilon^2}{2(R + b\epsilon/3)}}$ Homework: Prove that sub-Gaussianity imples mean zero