---
title: SVD
tags: Trang's lectures
---
# Table of Contents
[toc]
# Singular value decomposition
## Eigenvectors, eigenvalues, and diagonalization of matrices
**Eigenvectors and eigenvalues**. Let $A \in \mathbb{R}^{n\times n}$ is a square matrix, then vector $v \in \mathbb{R}^n$ is an eigenvector $A$ corresponding to eigenvalue $\lambda \in \mathbb{R}$ if
$$A v = \lambda v$$
* *Intuition*. By multiplying $v$ by $A$, we are scaling $v$ by a factor $\lambda$
* *Theorem*. If $v$ is an eigenvector of $A$, corresponding to eigenvalue $\lambda$, then $\gamma \cdot v$ is also an eigenvector of $A$ for every scalar $\gamma \in \mathbb{R}$
**Diagonalization**. Relate the concepts of matrix, eigenvectors, and eigenvalues
* *Definition*. A square matrix $A \in \mathbb{R}^{n\times n}$ can be decomposed into a very special form
$$A=P\cdot D\cdot P^{-1}$$
where
$$D=\begin{bmatrix}
\lambda_1 & 0 & \cdots & 0\\
0 & \lambda_2 & \cdots & 0\\
\vdots & \vdots & \ddots & \vdots\\
0 & 0 & \cdots & \lambda_n
\end{bmatrix} \in \mathbb{R}^{n\times n}$$
is a diagonal matrix with non-zero entries $\lambda_1,\dots,\lambda_n$ (eigenvalues of $A$)
$$P=\begin{bmatrix}| & & | \\
\mathbf{p}_1 & \cdots & \mathbf{p}_n \\
| & & |\end{bmatrix} \in \mathbb{R}^{n\times n}$$
is a square matrix with $\mathbf{p}_i$ is an eigenvector of $A$ corresponding to eigenvalue $\lambda_i$, i.e.
$$\forall i\in \{1,\dots,n\},A\cdot \mathbf{p}_i = \lambda_i \cdot \mathbf{p}_i$$
**Generalization of diagonalization**. Singular value decomposition
* *Definition*. Given a matrix $A\in\mathbb{R}^{m\times n}$, then $A$ can be decomposed into
$$A=U\cdot \Sigma \cdot V^{-1}$$
where
$$\Sigma=\begin{bmatrix}
\sigma_1 & 0 & \cdots & 0\\
0 & \sigma_2 & \cdots & 0\\
\vdots & \vdots & \ddots & \vdots\\
0 & 0 & \cdots & 0\end{bmatrix} \in \mathbb{R}^{m\times n}$$
is a diagonal matrix with non-zero entries $\sigma_1,\dots,\sigma_n$ (singular values of $A$) and
$$\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_n \geq 0$$
$U$ and $V$ are orthogonal matrices
* *Orthogonal matrix*. $U$ is an orthogonal matrix if its columns are mutually orthogonal to each other
* *Orthogonal vectors*. Two vectors $x$ and $y$ are orthogonal if
$$\langle x,y\rangle=x^T y=\sum_{i=1}^n x_i y_i=0$$
* *Properties*.
* If $V$ is an orthogonal matrix, then $V^{-1} = V^T$
$\to$ The SVD of $A$ can be written as $A=U\cdot \Sigma\cdot V^T$
* If $V$ is an orthogonal matrix, then $V$ is a rotation matrix
* *Rotation matrix*. $V \in \mathbb{R}^{n\times n}$ is a rotation matrix if, for any vectors $x,y\in\mathbb{R}^n$
$$\langle x,y\rangle = \langle Vx,Vy\rangle$$
* *Proof*. If $V$ is an orthogonal matrix, then $V$ is a rotation matrix, because
$$\langle Vx,Vy\rangle=(Vx)^T (Vy)=x^T V^T V y=x^T y=\langle x,y\rangle$$
* *Lemma*. $\sqrt{\langle x,x\rangle} = \|x\|_2$, because
$$\sqrt{\langle x,x\rangle}=\sqrt{\sum_{i=1}^n x_i^2}$$
* *Singular value of $A$*. The set of non-zero diagonal entries of $\Sigma$, i.e.
$$\{\sigma_i:i\in\{0,\dots,n\},\sigma_i \neq 0\}$$
* *Example*. Given $A \in \mathbb{R}^{4\times 3}$, then
$$A=U\cdot \Sigma \cdot V^{-1}$$
where
$$\Sigma=\begin{bmatrix}
\sigma_1 & 0 & 0 \\
0 & \sigma_2 & 0\\
0 & 0 & \sigma_3 \\ 0 & 0 & 0
\end{bmatrix} \in \mathbb{R}^{4\times 3}$$
**Compact SVD**. Given a matrix $A=U\cdot \Sigma \cdot V^T$, then we have
$$A=\sum_{i:\sigma_i\neq 0}\sigma_i \mathbf{u}_i \mathbf{v}^T_i$$
**Truncated SVD**.
$$A\approx A_k=\sum_{i=1}^k\sigma_i \mathbf{u}_i \mathbf{v}^T_i$$
# Principle component analysis (PCA)
**Covariance matrix**. Assume $X=(x_1,\dots,x_n)$ is a random vector
* *Variance*. $\text{Var}(X)=E[(X-\mu)^2]$ where $X\in\mathbf{R}$ where $\mu=E(X)$
* *Covariance*. $\text{Cov}(X,Y)=E[(X-\mu_X)(Y-\mu_Y)]$ where $\mu_X=E(X)$ and $\mu_Y=E(Y)$
* *Covariance matrix*.
$$M=\text{Cov}(X)=\begin{bmatrix}
\text{Cov}(x_1,x_1) & \text{Cov}(x_1,x_2) & \cdots & \text{Cov}(x_1,x_n)\\
\vdots & \vdots & \ddots & \vdots\\
\text{Cov}(x_n,x_1) & \text{Cov}(x_n,x_2) & \cdots & \text{Cov}(x_n,x_n)\\
\end{bmatrix}$$
**PCA**. Let $M$ be the covariance matrix of a random vector $X$. Consider its SVD
$$M=U\cdot \Sigma \cdot V^T$$
We have that $M$ is a symmetric matrix, therefore $U = V$, i.e.
$$M=U\cdot \Sigma \cdot U^T$$
Since $M$ is a square matrix, $M=U\cdot \Sigma\cdot U^T$ is the diagonalization of $M$. In other words, $\sigma_1,\dots,\sigma_n$ are eigenvalues of $M$
Consider any eigenvector $\mathbf{u}_i$ of $M$, which is the $i$-th column of $U$, this eigenvector corresponds to eigenvalue $\sigma_i$ of $M$.