--- title: SVD tags: Trang's lectures --- # Table of Contents [toc] # Singular value decomposition ## Eigenvectors, eigenvalues, and diagonalization of matrices **Eigenvectors and eigenvalues**. Let $A \in \mathbb{R}^{n\times n}$ is a square matrix, then vector $v \in \mathbb{R}^n$ is an eigenvector $A$ corresponding to eigenvalue $\lambda \in \mathbb{R}$ if $$A v = \lambda v$$ * *Intuition*. By multiplying $v$ by $A$, we are scaling $v$ by a factor $\lambda$ * *Theorem*. If $v$ is an eigenvector of $A$, corresponding to eigenvalue $\lambda$, then $\gamma \cdot v$ is also an eigenvector of $A$ for every scalar $\gamma \in \mathbb{R}$ **Diagonalization**. Relate the concepts of matrix, eigenvectors, and eigenvalues * *Definition*. A square matrix $A \in \mathbb{R}^{n\times n}$ can be decomposed into a very special form $$A=P\cdot D\cdot P^{-1}$$ where $$D=\begin{bmatrix} \lambda_1 & 0 & \cdots & 0\\ 0 & \lambda_2 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & \lambda_n \end{bmatrix} \in \mathbb{R}^{n\times n}$$ is a diagonal matrix with non-zero entries $\lambda_1,\dots,\lambda_n$ (eigenvalues of $A$) $$P=\begin{bmatrix}| & & | \\ \mathbf{p}_1 & \cdots & \mathbf{p}_n \\ | & & |\end{bmatrix} \in \mathbb{R}^{n\times n}$$ is a square matrix with $\mathbf{p}_i$ is an eigenvector of $A$ corresponding to eigenvalue $\lambda_i$, i.e. $$\forall i\in \{1,\dots,n\},A\cdot \mathbf{p}_i = \lambda_i \cdot \mathbf{p}_i$$ **Generalization of diagonalization**. Singular value decomposition * *Definition*. Given a matrix $A\in\mathbb{R}^{m\times n}$, then $A$ can be decomposed into $$A=U\cdot \Sigma \cdot V^{-1}$$ where $$\Sigma=\begin{bmatrix} \sigma_1 & 0 & \cdots & 0\\ 0 & \sigma_2 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & 0\end{bmatrix} \in \mathbb{R}^{m\times n}$$ is a diagonal matrix with non-zero entries $\sigma_1,\dots,\sigma_n$ (singular values of $A$) and $$\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_n \geq 0$$ $U$ and $V$ are orthogonal matrices * *Orthogonal matrix*. $U$ is an orthogonal matrix if its columns are mutually orthogonal to each other * *Orthogonal vectors*. Two vectors $x$ and $y$ are orthogonal if $$\langle x,y\rangle=x^T y=\sum_{i=1}^n x_i y_i=0$$ * *Properties*. * If $V$ is an orthogonal matrix, then $V^{-1} = V^T$ $\to$ The SVD of $A$ can be written as $A=U\cdot \Sigma\cdot V^T$ * If $V$ is an orthogonal matrix, then $V$ is a rotation matrix * *Rotation matrix*. $V \in \mathbb{R}^{n\times n}$ is a rotation matrix if, for any vectors $x,y\in\mathbb{R}^n$ $$\langle x,y\rangle = \langle Vx,Vy\rangle$$ * *Proof*. If $V$ is an orthogonal matrix, then $V$ is a rotation matrix, because $$\langle Vx,Vy\rangle=(Vx)^T (Vy)=x^T V^T V y=x^T y=\langle x,y\rangle$$ * *Lemma*. $\sqrt{\langle x,x\rangle} = \|x\|_2$, because $$\sqrt{\langle x,x\rangle}=\sqrt{\sum_{i=1}^n x_i^2}$$ * *Singular value of $A$*. The set of non-zero diagonal entries of $\Sigma$, i.e. $$\{\sigma_i:i\in\{0,\dots,n\},\sigma_i \neq 0\}$$ * *Example*. Given $A \in \mathbb{R}^{4\times 3}$, then $$A=U\cdot \Sigma \cdot V^{-1}$$ where $$\Sigma=\begin{bmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0\\ 0 & 0 & \sigma_3 \\ 0 & 0 & 0 \end{bmatrix} \in \mathbb{R}^{4\times 3}$$ **Compact SVD**. Given a matrix $A=U\cdot \Sigma \cdot V^T$, then we have $$A=\sum_{i:\sigma_i\neq 0}\sigma_i \mathbf{u}_i \mathbf{v}^T_i$$ **Truncated SVD**. $$A\approx A_k=\sum_{i=1}^k\sigma_i \mathbf{u}_i \mathbf{v}^T_i$$ # Principle component analysis (PCA) **Covariance matrix**. Assume $X=(x_1,\dots,x_n)$ is a random vector * *Variance*. $\text{Var}(X)=E[(X-\mu)^2]$ where $X\in\mathbf{R}$ where $\mu=E(X)$ * *Covariance*. $\text{Cov}(X,Y)=E[(X-\mu_X)(Y-\mu_Y)]$ where $\mu_X=E(X)$ and $\mu_Y=E(Y)$ * *Covariance matrix*. $$M=\text{Cov}(X)=\begin{bmatrix} \text{Cov}(x_1,x_1) & \text{Cov}(x_1,x_2) & \cdots & \text{Cov}(x_1,x_n)\\ \vdots & \vdots & \ddots & \vdots\\ \text{Cov}(x_n,x_1) & \text{Cov}(x_n,x_2) & \cdots & \text{Cov}(x_n,x_n)\\ \end{bmatrix}$$ **PCA**. Let $M$ be the covariance matrix of a random vector $X$. Consider its SVD $$M=U\cdot \Sigma \cdot V^T$$ We have that $M$ is a symmetric matrix, therefore $U = V$, i.e. $$M=U\cdot \Sigma \cdot U^T$$ Since $M$ is a square matrix, $M=U\cdot \Sigma\cdot U^T$ is the diagonalization of $M$. In other words, $\sigma_1,\dots,\sigma_n$ are eigenvalues of $M$ Consider any eigenvector $\mathbf{u}_i$ of $M$, which is the $i$-th column of $U$, this eigenvector corresponds to eigenvalue $\sigma_i$ of $M$.