# 03/16 Lecture 11: Outline: 1. Motivation for PCA/low-rank approximation 2. Instability of eigenvalues/eigenvectors 3. Basic inequality for PCA. Reading: Chapter 8 (8.1 -> 8.3) ## Discretization/covering of sets $B_2(1) = \{u \in R^d | \ ||u||_2 \leq 1\}$ Find $\{u^1, ..., u^N\} s.t. \forall u \in B_2(1), \exists u^j \ s. t. \ ||u-u^j||_2 \leq \delta$. $\log N(\delta) \approx d \log (1+2/\delta)$ $\log N(\delta)$ metric entropy --- **Proof (Special case of $\infty$):** In $B_\infty (1) = \{\theta \in R^2 | \ ||\theta||_\infty \leq 1\}$ Let $||u^j - u||_\infty \leq \delta$ Fill the space with boxes with side length $\delta$. ## PCA/Low-rank Matrix Approximation Set-up: $\Sigma \in R^{d*d}, \Sigma = \Sigma^T. \gamma_1(\Sigma) \leq ... \leq \gamma_d(\Sigma)$. 1. Say we have random vector $C \in R^d$. $E[X] = 0, cov(X) = \Sigma$. **What is the direction of maximal variance?** A direction is a vector $u \in R^d, ||u||_2 = 1$, we look for $\max_{||u||_2 = 1} var(<u, X>) = \max_{||u||_2 = 1} var(\sum_{j=1}^d u_jX_j) = \max_{||u||_2 = 1} E[u^TX]^2 =\max_{||u||_2 = 1} u^T\Sigma u = \gamma_1(\Sigma) = |||\Sigma|||_{2}$ $u^*$ is the max eigenvector achieves the max. In practice, Given $\{X_i\}_{i=1}^n$, compute $\hat{\Sigma} = \frac 1 n \sum_{i=1}^n x_ix_i^T$ and then plug in for $\Sigma$. Generalization $U \in R^{d * r}$, r = number of eigenvectors, r<d, $U=(u_1 | ... | u_r)$ has orthonormal columns, i.e. $<u_i, u_j> = (i \neq j) ? 0:1.$ $\max_{V \in R^{d * r}} E[||V^TX||_2^2] = \sum_{j=1}^r \gamma_j(\Sigma)$ For $V^*$ being the argmax, each column is an eigenvector. --- **Example:** (Fitting mixture models) Two distributions centered at $\theta^*$ and $-\theta^*$ with circular contours. The overall density is a bimodal mixture. $Z \sim Ber(\frac 1 2), Z \in \{0, 1\}$ $X = (1-Z) Y_0+ X Y_1$ $E[Y_0] = \theta^*$ $E[Y_1] = -\theta^*$ $cov(Y_0)=cov(Y_1) = \sigma^2I$. > To estimate $\theta$'s, sample mean is 0 so it doesn't work. The information is embedded in higher-order moment, which PCA is built for. $cov(X) = E[XX^T] = \theta^*(\theta^*)^T + \sigma^2I = \Sigma$. The maximum eigenvector is $\{\theta^*, -\theta^*\}$. In general situation, if cov is not $\sigma^2 I$ anymore, one cannot say $\theta^*$ is the max eigenvector. > Martin: Academia is about maximizing number of paper per effort. > HW2.5 is related to Elliptical PCA problem $Q = \Sigma, R = \Sigma + P = Q + P$. All matrices are symmetric. Assume $|||P|||_2 \leq \epsilon$. - $P$ = perturbation - $Q$ original matrix - $R$ the perturbed matrix Would like to know how does eigenvalues and eigenvectors change. max eigenvalue of $R$ = $\gamma_1(R) = \gamma_1(Q + P) \\ =\max_{||u||_2 = 1} u^T(Q+P)u \\ \leq \max_{||u||_2 = 1} u^TQu + \max_{||u||_2 = 1} u^TPu \\ = \gamma_1(Q) + \epsilon$. If the pertubation is small, the estimation for eigenvalue is more accurate. --- **Weyl's theorem** $\max_{j = 1, ..., d} |\gamma_j(Q)-\gamma_j(R)| \leq |||Q-R|||_2$. > All of the eigenvalues are within in $\epsilon$ of each other if $|||P|||_2 \leq \epsilon$. > Eigenvalues are stable --- **Example: (Eigenvectors are not stable in general.)** $Q_\epsilon = \begin{pmatrix}1 & \epsilon \\ \epsilon &1.01 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1.01 \end{pmatrix}+ \epsilon\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$ When $\epsilon = 0.01$, max eigenvector changes from $(0, 1)$ to $(0.53, 0.85)$. The thing is that the first matrx has very close eigenvalues. --- **Theorem** Consider PSD $\Sigma$ with max. normal eigenvector $\theta^*$ and gap $v = \gamma_1(\Sigma) - \gamma_2(\Sigma) > 0$. Take any sym. perturbation $P$, $|||P|||_2 \leq \frac v 2$. Then the perturbed matrix $Q = \Sigma + P$ has unique max eigenvector such that $$||\hat{\theta} - \theta||_2 \leq \frac{||\tilde{p}||_2}{v - 2|||P|||_2}$$ $$U^TPU = \begin{pmatrix} \tilde{p}_{11} & \tilde{p}^T \\ \tilde{p} & \tilde{p}_{22} \end{pmatrix}$$ Proof idea: $\theta^* = \argmax_{||u||_2 =1} u^T\Sigma u$ $\hat{\theta} \in \argmax_{||u||_2 =1} u^T(\Sigma+P)u$ --- **Basic Inequality:** $\hat{\Delta} = \hat{\theta} - \theta^*, \rho = <\hat{\theta, \theta^*> \geq 0$. $v(1-<\hat{\theta}, \theta^*>)^2 \leq |\Psi(\hat{\Delta}, P)|$ We want the eigengap $v > 0$. $\frac 1 2||\hat{\theta}-\theta^*||_2^2 = 1 - <\hat{\theta}, \theta^*>$. $\Psi(\hat{\Delta}, P) = \hat{\Delta}^T P \hat{\Delta} + 2<\hat{\Delta}, P\theta^*>$