SVD, EVD and PCA

# SVD, EVD and PCA ###### tags: `Maching Learning` `Principle Component Analysis` `PCA` ## What singular value meaning? - For a $n\times d$ matrix $A$ - Minimize the vertical distance ![](https://i.imgur.com/VmFPAoL.png) $$ x_{i1}^2+ x_{i2}^2+\cdots +x_{id}^2=(length~of~projection)^2+(distance~of~point~to~line)^2 $$ That is $$ (distance~of~point~to~line)^2 = x_{i1}^2+ x_{i2}^2+\cdots +x_{id}^2-(length~of~projection)^2 $$ To minimize the distance of point to line $$ \min \sum_{i=1}^n x_{i1}^2+ x_{i2}^2+\cdots +x_{id}^2 $$ Define the *the first singular vector*, $\mathbf{v}_1$, of $A$ $$ \mathbf{v}_1 = \arg \max_{|\mathbf{v}|=1}|A\mathbf{v}| $$ The value $\sigma _1(A)=|A\mathbf{v}_1|$ is called the *first singular value* of $A$ :::info 此處可以發現，$\mathbf{v}_1$便是所有點投影到該向量上後總和最大的方向 Note : $A=\begin{pmatrix} \mathbf{x}_1 \\\ \mathbf{x}_2 \\\ \vdots \\\ \mathbf{x}_n\end{pmatrix}$ then $|A\mathbf{v}_1|=\sigma_1$ ::: The *second singular vector*, $\mathbf{v}_2$ , is defined by the best fit line perpendicular to $\mathbf{v}_1$ $$ \mathbf{v}_2 =\arg \max_{\mathbf{v} \perp \mathbf{v}_1, |\mathbf{v}|=1}{|A \mathbf{v} |} $$ The value $\sigma _2(A)=|A\mathbf{v}_2|$ is called the *second singular value* of $A$ and so on. The process stops when we have found $$ \mathbf{v}_1, \mathbf{v}_2, \cdots , \mathbf{v}_r $$ as singular vectors and $$ \arg \max_{\mathbf{v} \perp \mathbf{v}_1, \mathbf{v}_2, \cdots , \mathbf{v}_r, \\ |\mathbf{v}|=1}{|A \mathbf{v} |} = 0 $$ :::info 此處代表$rank(\mathbf{A})=r$，$\mathbf{v}_{r+1}$ is null space of $A$ ::: ## Singular Value Decomposition Since $\mathbf{v}_1, \mathbf{v}_2, \cdots , \mathbf{v}_r$ span the spcae of all rows of $A$. Thus, for each row of $A$, $\mathbf{a}_j$, $$ \sum_{j=1}^n|\mathbf{a}_j|^2=\sum_{j=1}^n \sum_{i=1}^r (\mathbf{a}_j \mathbf{v}_i)^2 =\sum_{i=1}^r \sum_{j=1}^n (\mathbf{a}_j \mathbf{v}_i)^2 =\sum_{i=1}^r|A\mathbf{v}_i |^2=\sum_{i=1}^r \sigma_i ^2(A) $$ Now, $A\mathbf{v}$ is the linear combination of $A\mathbf{v}_1, A\mathbf{v}_2, \cdots , A\mathbf{v}_r$. So the $A\mathbf{v}_1, A\mathbf{v}_2, \cdots , A\mathbf{v}_r$ form a fundamental set of vectors associated with $A$. We normalize them to length one by $$ \mathbf{u}_i=\frac{1}{\sigma_i(A)}A\mathbf{v}_i $$ The vectors $\mathbf{u}_1, \mathbf{u}_2, \cdots , \mathbf{u}_r$ are called the *left singular vectors* of $A$. The $\mathbf{v}_i$ are called *right singluar vectors*. :::info 此處可以看成 $A\mathbf{v}_i=\sigma_i \mathbf{u}_i$，先前有提到$\sigma _1(A)=|A\mathbf{v}_1|$， $A$的各點$\mathbf{a}_j$投影到$\mathbf{v}_i$上的總合長度為$\sigma_i$，而其方向為$\mathbf{u}_i$。 ::: Therefore, $$ A=\sum_{i=1}^r \sigma_i(A)\mathbf{u}_i\mathbf{v}_i^T $$ :::success :bulb: 回想我們在進行SVD計算時，我們會將$A$矩陣變成$A^TA$來計算$right~singular~vector,~\mathbf{v}_i$ 然後再使用$AA^T$來計算$left~singular~vector,~\mathbf{u}_i$。但是我們比較在意的一般只有$\mathbf{v}_i,~\sigma_i$，分別代表想要投影的方向以及其重要性。 ::: ## SVD v.s. EVD Eigenvalue decomposition only can be applied on a square matrix. For a square matrix $A$, we can find a vector $\mathbf{v}_i$ such that $$ A\mathbf{v}_i = \sigma_i \mathbf{v}_i $$ Here, we can find that EVD is a special case of SVD, $\mathbf{v}_i = \mathbf{u}_i$. Therefore, $$ A=\sum_{i=1}^r \sigma_i(A)\mathbf{v}_i\mathbf{v}_i^T $$ When matrix $A$ is semi-positive definite, the SVD result equals to EVD. ## PCA v.s. SVD PCA常用於特徵降維，其目的是找到最能夠代表所有特徵的方向並進行投影，同時提供使用者重要性來評估要保留多少特徵。如何評估最能夠代表特徵的方向呢？PCA使用共變異矩陣(Covariance matrix)來判斷。 ![](https://i.imgur.com/dSgFtwn.png) 圖中的$\mathbf{v}$對資料點有著較遠的距離，因此不是最好的投影方向。最好的投影方向應該為$\mathbf{v}'$。從以上就可以發現這個概念跟SVD跟相似，主要差異在於 $$ PCA~needs~to~calculate~the~covariance~matrix~first \\and~then~applys~SVD $$ Covariance matrix of matrix $A$ and $B$ is $$ cov(A,B)=\frac{1}{d-1}\sum_{i=1}^d(A_i-\mu_A)^*(B_i-\mu_B),~where~A_i~is~a~column~vector~of~A $$ :::info 一般而言，量測到的數據$A$會記錄成 $$ A = \begin{pmatrix} \mathbf{y}_1 \\\ \mathbf{y}_2 \\\ \vdots \\\ \mathbf{y}_n \end{pmatrix},~where~\mathbf{y}_i~is~1\times d $$ 但是要計算資料的變異數會以量測特徵(column)為單位 ::: Therefore, PCA algorithm is $$ cov(A)\mathbf{v} = \sum_{i=1}^r\sigma_i \mathbf{u}_i $$ Also, $cov(A)$ is symetric matrix which is also positive definite. $$ \mathbf{v}_i=\mathbf{u}_i $$ 其中，$\sigma_i$便是第$i$ 主成分的重要性，$\mathbf{v}_i$則是主成分方向。即 $$ PCA(X)=SVD(X^TX)=EVD(X^TX),~where~X~is~row~vector. $$ ## Discussion :question: Why we can use $\mathbf{v}$ to project the original data? This $\mathbf{v}$ is obtained by covariance matrix! :bulb: Key concept : find a direction to let the projection has **maximum variance** $$ variance = \sigma^2 =\frac{1}{n} \sum_{i=1}^n(\mathbf{v}^Tx_i-\mu)^2=\frac{1}{n} \sum_{i=1}^n(\mathbf{v}^Tx_i)^2,~if~\mu=0 \\ Therefore, \sigma^2=\frac{1}{n} \sum_{i=1}^n(\mathbf{v}^Tx_i)(\mathbf{v}^Tx_i)^T=\frac{1}{n} \sum_{i=1}^n(\mathbf{v}^Tx_ix_i^T\mathbf{v})=\mathbf{v}^T(\sum_{i=1}^n(x_ix_i^T))\mathbf{v} \\ =\mathbf{v}^TC\mathbf{v},~where~C~is~covariance~matrix. $$ So, our target function will become $$ \mathbf{v}=\arg\max_{||\mathbf{v}||=1}\mathbf{v}^TC\mathbf{v} \\ f(\mathbf{v}, \lambda)=\mathbf{v}^TC\mathbf{v}-\lambda(||\mathbf{v}||-1) \\ ~~~~~~~~~~~~=\mathbf{v}^TC\mathbf{v}-\lambda(\mathbf{v}^T\mathbf{v}-1) $$ After the partial differential, $$ C\mathbf{v} = \lambda\mathbf{v} \\ ||\mathbf{v}||=1 $$ :::info Notice that this form is standard form of eigenvalue problem. If we can find the eigenvector of covariance matrix, then we find a projection direction which cam make projected data has maximum variance. ::: ## Example For a matrix $\mathbf{A}$ $$ A = \begin{pmatrix} 1&2 \\\ 3&4 \\\ 5&6 \\\ 7&8 \end{pmatrix} $$ PCA result will be $$ [coeff,score,latent]=pca(\mathbf{A}),\\\ coeff= \begin{pmatrix} 0.7071&0.7071 \\\ 0.7071&-0.7071\end{pmatrix},\\\ score= \begin{pmatrix} -4.243&4.484\times10^{-16} \\\ -1.414&-1.623\times10^{-16} \\\ 1.414&1.623\times10^{-16} \\\ -4.243&3.402\times10^{-16} \end{pmatrix},\\\ latent = \begin{pmatrix} 13.333 \\\ 1.232\times10^{-31}\end{pmatrix} $$ SVD of covariance matrix $\mathbf{C}=\frac{1}{N-1}(\mathbf{A}-\mu)^2$ will be $$ [\mathbf{U},\mathbf{S},\mathbf{V}] = svd(\mathbf{C}),\\\ \mathbf{S}=\begin{pmatrix} 13.333&0 \\\ 0&7.626\times10^{-17}\end{pmatrix},\\\ \mathbf{V}=\begin{pmatrix} 0.7071&0.7071 \\\ 0.7071&-0.7071\end{pmatrix}=\mathbf{U}\\\ $$ ## Reference [直觀理解並應用主成分分析](https://leemeng.tw/essence-of-principal-component-analysis.html) [奇異值分解|線代啟示錄](https://ccjou.wordpress.com/2009/09/01/%E5%A5%87%E7%95%B0%E5%80%BC%E5%88%86%E8%A7%A3-svd/) [CMU teaching material](https://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/book-chapter-4.pdf)