# 03/16 Lecture 11:
Outline:
1. Motivation for PCA/low-rank approximation
2. Instability of eigenvalues/eigenvectors
3. Basic inequality for PCA.
Reading:
Chapter 8 (8.1 -> 8.3)
## Discretization/covering of sets
$B_2(1) = \{u \in R^d | \ ||u||_2 \leq 1\}$
Find $\{u^1, ..., u^N\} s.t. \forall u \in B_2(1), \exists u^j \ s. t. \ ||u-u^j||_2 \leq \delta$.
$\log N(\delta) \approx d \log (1+2/\delta)$
$\log N(\delta)$ metric entropy
---
**Proof (Special case of $\infty$):**
In $B_\infty (1) = \{\theta \in R^2 | \ ||\theta||_\infty \leq 1\}$
Let $||u^j - u||_\infty \leq \delta$
Fill the space with boxes with side length $\delta$.
## PCA/Low-rank Matrix Approximation
Set-up: $\Sigma \in R^{d*d}, \Sigma = \Sigma^T. \gamma_1(\Sigma) \leq ... \leq \gamma_d(\Sigma)$.
1. Say we have random vector $C \in R^d$. $E[X] = 0, cov(X) = \Sigma$.
**What is the direction of maximal variance?**
A direction is a vector $u \in R^d, ||u||_2 = 1$, we look for $\max_{||u||_2 = 1} var(<u, X>) = \max_{||u||_2 = 1} var(\sum_{j=1}^d u_jX_j) = \max_{||u||_2 = 1} E[u^TX]^2 =\max_{||u||_2 = 1} u^T\Sigma u = \gamma_1(\Sigma) = |||\Sigma|||_{2}$
$u^*$ is the max eigenvector achieves the max.
In practice, Given $\{X_i\}_{i=1}^n$, compute $\hat{\Sigma} = \frac 1 n \sum_{i=1}^n x_ix_i^T$ and then plug in for $\Sigma$.
Generalization
$U \in R^{d * r}$, r = number of eigenvectors, r<d, $U=(u_1 | ... | u_r)$ has orthonormal columns, i.e. $<u_i, u_j> = (i \neq j) ? 0:1.$
$\max_{V \in R^{d * r}} E[||V^TX||_2^2] = \sum_{j=1}^r \gamma_j(\Sigma)$
For $V^*$ being the argmax, each column is an eigenvector.
---
**Example:**
(Fitting mixture models)
Two distributions centered at $\theta^*$ and $-\theta^*$ with circular contours.
The overall density is a bimodal mixture.
$Z \sim Ber(\frac 1 2), Z \in \{0, 1\}$
$X = (1-Z) Y_0+ X Y_1$
$E[Y_0] = \theta^*$
$E[Y_1] = -\theta^*$
$cov(Y_0)=cov(Y_1) = \sigma^2I$.
> To estimate $\theta$'s, sample mean is 0 so it doesn't work. The information is embedded in higher-order moment, which PCA is built for.
$cov(X) = E[XX^T] = \theta^*(\theta^*)^T + \sigma^2I = \Sigma$.
The maximum eigenvector is $\{\theta^*, -\theta^*\}$.
In general situation, if cov is not $\sigma^2 I$ anymore, one cannot say $\theta^*$ is the max eigenvector.
> Martin: Academia is about maximizing number of paper per effort.
> HW2.5 is related to Elliptical PCA problem
$Q = \Sigma, R = \Sigma + P = Q + P$. All matrices are symmetric. Assume $|||P|||_2 \leq \epsilon$.
- $P$ = perturbation
- $Q$ original matrix
- $R$ the perturbed matrix
Would like to know how does eigenvalues and eigenvectors change.
max eigenvalue of $R$ = $\gamma_1(R) = \gamma_1(Q + P) \\
=\max_{||u||_2 = 1} u^T(Q+P)u \\
\leq \max_{||u||_2 = 1} u^TQu + \max_{||u||_2 = 1} u^TPu \\
= \gamma_1(Q) + \epsilon$.
If the pertubation is small, the estimation for eigenvalue is more accurate.
---
**Weyl's theorem**
$\max_{j = 1, ..., d} |\gamma_j(Q)-\gamma_j(R)| \leq |||Q-R|||_2$.
> All of the eigenvalues are within in $\epsilon$ of each other if $|||P|||_2 \leq \epsilon$.
> Eigenvalues are stable
---
**Example: (Eigenvectors are not stable in general.)**
$Q_\epsilon = \begin{pmatrix}1 & \epsilon \\ \epsilon &1.01 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1.01 \end{pmatrix}+ \epsilon\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$
When $\epsilon = 0.01$, max eigenvector changes from $(0, 1)$ to $(0.53, 0.85)$.
The thing is that the first matrx has very close eigenvalues.
---
**Theorem**
Consider PSD $\Sigma$ with max. normal eigenvector $\theta^*$ and gap $v = \gamma_1(\Sigma) - \gamma_2(\Sigma) > 0$. Take any sym. perturbation $P$, $|||P|||_2 \leq \frac v 2$.
Then the perturbed matrix $Q = \Sigma + P$ has unique max eigenvector such that
$$||\hat{\theta} - \theta||_2 \leq \frac{||\tilde{p}||_2}{v - 2|||P|||_2}$$
$$U^TPU = \begin{pmatrix} \tilde{p}_{11} & \tilde{p}^T \\ \tilde{p} & \tilde{p}_{22} \end{pmatrix}$$
Proof idea:
$\theta^* = \argmax_{||u||_2 =1} u^T\Sigma u$
$\hat{\theta} \in \argmax_{||u||_2 =1} u^T(\Sigma+P)u$
---
**Basic Inequality:**
$\hat{\Delta} = \hat{\theta} - \theta^*, \rho = <\hat{\theta, \theta^*> \geq 0$.
$v(1-<\hat{\theta}, \theta^*>)^2 \leq |\Psi(\hat{\Delta}, P)|$
We want the eigengap $v > 0$. $\frac 1 2||\hat{\theta}-\theta^*||_2^2 = 1 - <\hat{\theta}, \theta^*>$.
$\Psi(\hat{\Delta}, P) = \hat{\Delta}^T P \hat{\Delta} + 2<\hat{\Delta}, P\theta^*>$