Intuition Behind Singular Value Decomposition (SVD)

--- tags: linear algebra --- # Intuition Behind Singular Value Decomposition (SVD) ## Definition The singular value decomposition of an $m \times n$ matrix $T$ is a factorization of the form $U \Sigma V^*$, where $U$ is an unitary matrix, $\Sigma$ is a $m \times n$ is a rectangle diagonal matrix with non-negative real numbers on the diagonal, and $V$ is an $n\times n$ unitary matrix. That is $$ T = \begin{bmatrix} u_1 \ldots u_m \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 & 0 & \cdots & \cdots & 0 \\ 0 & \sigma_2 & 0 & \cdots & \cdots & 0 \\ \vdots & \vdots & \ddots & & & \vdots\\ 0 & 0 & \cdots & \sigma_m& \cdots & 0 \end{bmatrix} \begin{bmatrix} v_1^T \\ \vdots \\ v_n^T \\ \end{bmatrix} $$ (In this expression for illustration, $n>m$.) ## Intuition Suppose that $y = T(x)$, what SVD does is to decompose $y$ into the the components of an orthonormal basis of $\mathbb R^m$. To see this, we write $U = \begin{bmatrix} u_1 \cdots u_m \end{bmatrix}$ and assume that $U$ is orthonormal (i.e. the columns of $U$ constitute an orthonormal basis of $\mathbb R^m$), so $y$ can be expressed in this way: $$ T(x) = y = \begin{bmatrix} u_1 \ldots u_m \end{bmatrix} \begin{bmatrix} c_1 \\ \vdots \\ c_m \\ \end{bmatrix}, $$ where $c_i$ is the corresponding components of $u_i$. That is, we hope we can express all $y$ in terms of the linear combination of an fixed orthonormal basis. Moreover, from the definition of SVD, we found that $$ \begin{bmatrix} c_1 \\ \vdots \\ c_m \end{bmatrix} = \Sigma V^*x = \begin{bmatrix} \sigma_1 & 0 & 0 & \cdots & \cdots & 0 \\ 0 & \sigma_2 & 0 & \cdots & \cdots & 0 \\ \vdots & \vdots & \ddots & & & \vdots\\ 0 & 0 & \cdots & \sigma_m& \cdots & 0 \end{bmatrix} \begin{bmatrix} v_1^T \\ \vdots \\ v_n^T \\ \end{bmatrix}x, $$ where $\{v_1, \ldots, v_n \} = \beta$ is an orthonormal matrix. Because $\beta$ is orthonrmal, $V^*$ is just a orthonrmal projection matrix that project $x$ on $\beta$. Thus, we have $$ V^* x = \begin{bmatrix} v_1^T \\ \vdots \\ v_n^T \\ \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} = \begin{bmatrix} d_1 \\ \vdots \\ d_n \end{bmatrix}, $$ where $d_i$ is $x$'s corresponding components of $v_i$. From the perspective of matrix multiplication, the value of $c_i$ ***only*** depends on the value of $d_i$, which means $y$'s components on $u_i$ only depends on $x$'s components on $v_i$. In other words, if $i \neq j$, then $T(v_j) = \sigma_j u_j$ doesn't make any contribution to $y$'s component on $u_i$ and only $x$'s component on $v_i$ will make contibute to it. Based on the discussion and the fact that $\gamma = \{ u_1, \cdots ,u_m\}$ is orthonormal, we have $$ \forall i \neq j, \langle T(v_i), T(v_j) \rangle = 0 $$ The question is, how to find such basis $\beta$, or, is $\beta$ really exists for arbitrary matrix $T$? ## Justification From the discussion of previous section, we know that to find a singular value decompositino of $T$, we must find a *orthonormal* basis $\beta = \{v_1, \ldots, v_n\}$ such that the following equality holds for $i \neq j$: $$ \langle T(v_i), T(v_j) \rangle = 0 $$ Now, we apply the property of adjoint operator of $T$ (denoted by $T^*$): $$ \langle T(v_i), T(v_j) \rangle = 0 \iff \langle v_i, T^* T(v_j) \rangle = 0 $$ (Actually, when $T$ is a real matrix, we have $T^t = T^*$, where $T^t$ denotes the transpose of $T$) Observe that $T^*T$ is a semi-positive definite matrix (you can prove it by the definition of semi-positive matrix). This implies that we can find an orthonormal basis $\{v'_1, \ldots, v'_n\}$ such that $v_i$ is a eigenvector of $T^*T$: $$ T^*T(v_i) = \lambda_i v'_i. $$ Then we have for $i \neq j$, $$ \langle T(v'_i), T(v'_j) \rangle = \langle v'_i, T^* T(v'_j) \rangle = \lambda_j \langle v'_i, v'_j \rangle= 0 $$ Great! $\{v'_1, \ldots, v'_n \} = \{ v_1, \ldots , v_n\}$ is the basis we want! $V$'s row is jsut the transpose of $v'_i$ in $\{v'_1, \ldots, v'_n\}$. Now the next step is to find the value of $\sigma_i$ in $\Sigma$: $$ \begin{align} & \langle \sigma_i u_i, \sigma_i u_i \rangle = \langle T(v_i), T(v_i) \rangle = \langle v_i, T^*T(v_i) \rangle = \langle v_i, \lambda_i v_i \rangle \\ \Rightarrow & \sigma_i = \sqrt{\lambda_i} \end{align} $$ Sicne $T^*T$ is semi-positive definite, we have $\lambda_i \ge 0$. Thus, the value of $\sigma_i$ is non-negative. The last step is to find $U$, this is simple because we know that $T(v_i) = \sigma_i u_i \Rightarrow u_i = \frac{1}{\sigma_i} T(v_i) = \frac{1}{\sqrt{\lambda_i}} T(v_i)$. Note that if the rank of $T$ (dimension of $R(T)$) $r$ is less than $m$, then there would be $m - r$ columns of $U$ to be determined. This can be done using the fact that $R(T) \perp N(T^*)$. Thus, the $m-r$ columns should be the orthonormal basis of $N(T^*)$.