主成份分析 - HackMD

# 主成份分析 ![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png) This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/). $\newcommand{\trans}{^\top} \newcommand{\adj}{^{\rm adj}} \newcommand{\cof}{^{\rm cof}} \newcommand{\inp}[2]{\left\langle#1,#2\right\rangle} \newcommand{\dunion}{\mathbin{\dot\cup}} \newcommand{\bzero}{\mathbf{0}} \newcommand{\bone}{\mathbf{1}} \newcommand{\ba}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\bc}{\mathbf{c}} \newcommand{\bd}{\mathbf{d}} \newcommand{\be}{\mathbf{e}} \newcommand{\bh}{\mathbf{h}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bq}{\mathbf{q}} \newcommand{\br}{\mathbf{r}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\tr}{\operatorname{tr}} \newcommand{\nul}{\operatorname{null}} \newcommand{\rank}{\operatorname{rank}} %\newcommand{\ker}{\operatorname{ker}} \newcommand{\range}{\operatorname{range}} \newcommand{\Col}{\operatorname{Col}} \newcommand{\Row}{\operatorname{Row}} \newcommand{\spec}{\operatorname{spec}} \newcommand{\vspan}{\operatorname{span}} \newcommand{\Vol}{\operatorname{Vol}} \newcommand{\sgn}{\operatorname{sgn}} \newcommand{\idmap}{\operatorname{id}} \newcommand{\am}{\operatorname{am}} \newcommand{\gm}{\operatorname{gm}} \newcommand{\mult}{\operatorname{mult}} \newcommand{\iner}{\operatorname{iner}}$ ```python from lingeo import random_int_list ``` ## Main idea Let $X$ be an $N\times d$ data matrix whose the sample vectors are centered at $\bzero\in\mathbb{R}^d$. Let $C = \frac{1}{N-1}X\trans X$ be the covariance matrix. Let $\bv$ be a unit vector in $\mathbb{R}^d$. One may project all data points onto the direction of $\bv$. Indeed, the projected data is $\bu = X\bv$. Thus, the mean of $\bu$ is $\inp{\bu}{\bone} = \bone\trans X\bu = \bzero\trans\bu = 0$, and the variance of $\bu$ is $\frac{1}{N-1}\inp{\bu}{\bu} = \frac{1}{N-1}\bu\trans X\trans X\bu = \bv\trans C\bv$. It is natural to consider the optimization problem: maximize $\bv\trans C\bv$, subject to $\|\bv\| = 1$. Let $\lambda_1\geq \cdots \geq \lambda_d$ be the eigenvalues of $C$ and $\{\bv_1,\ldots,\bv_d\}$ the corresponding orthonormal eigenbasis. According to the Rayleigh quotient theorem, the maximum is achieved by $\lambda_1$ when $\bv = \bv_1$. Therefore, $\bv_1$ is called the first **principal component** of the data, and one may continue to take $\bv_2,\bv_3,\ldots$ as the next principal components. ##### Algorithm (principal component analysis) Input: a data represented by an $N\times d$ matrix $X$ and the desired number $k$ of principal components Output: the principal components represented by the column vectors of a matrix $P$ 1. Let $J$ be the $N\times N$ all-ones matrix. Define $X_0 = (I - \frac{1}{n}J)X$. 2. Calculate the covariance matrix $C= \frac{1}{N-1}X_0\trans X_0$. 3. Find a diagonal matrix $D$, whose diagonal entries are $\lambda_1 \geq \cdots \geq \lambda_d$, and an orthogonal matrix $Q$ such that $C = QDQ\trans$. 3. Let $P$ be the $d\times k$ matrix obtained by the first $k$ columns of $Q$. A few points to emphasize: - The $Q$ matrix is the same as the $V$ in the algorithm in 606. - The eigenvalues $\lambda_1 \geq \cdots \geq \lambda_d$ are the variances when the data are projected onto the columns of $Q$. ## Side stories - explained variance ratio - `scipy.linalg.eigh` ## Experiments ##### Exercise 1 執行以下程式碼。 ```python= ### code import numpy as np import matplotlib.pyplot as plt %matplotlib inline np.random.seed(0) print_ans = False mu = np.random.randint(-3,4, (2,)) cov = np.array([[1,1.9], [1.9,4]]) X = np.random.multivariate_normal(mu, cov, (20,)) mu = X.mean(axis=0) X0 = X - mu # u,s,vh = np.linalg.svd(X0) C = (1 / (19)) * X0.T.dot(X0) vals,vecs = np.linalg.eigh(C) vals = vals[::-1] vecs = vecs[:,::-1] P = vecs[:,:1] plt.axis('equal') plt.scatter(*X.T) plt.scatter(*mu, c="red") plt.arrow(*mu, *(vecs[:,0]), head_width=0.3, color="red") xs = X0.dot(P) ys = np.zeros_like(xs) plt.scatter(xs, ys) pretty_print(LatexExpr("Q =")) print(vecs) if print_ans: print("red point: center") print("red arrow: first principal component") print("orange points: projection of blue points onto the red arrow") print("red arrow = first column of Q") ``` ##### Exercise 1(a) 若藍點為給定的資料。改變不同的 `seed`，說明紅點、紅箭頭、橘點的意思。當 `seed = 1` 時可得到以下題目資訊。 ![](https://imgur.com/CLBbKeo.png) :::warning - [x] 紅箭頭上的點 --> 紅箭頭上的分量 ::: $Ans:$ 紅點為藍點們的中心點，\ 紅箭頭為藍點的分布趨勢，\ 橘點為藍點以紅點為中心紅箭頭為基軸投影在紅箭頭上的分量。 ##### Exercise 1(b) 令 $X_0$ 的列向量為將藍點資料置中過後的資料點、而 $C = \frac{1}{N-1}X_0\trans X_0$ 為其共變異數矩陣。若 $C = QDQ\trans$ 為 $C$ 的譜分解，其中 $D$ 的對角線項由大到小排列。說明紅箭頭與 $Q$ 的關係。 :::warning - [x] 中英數之間空格 ::: $Ans:$ $Q$ 的行為此資料的主成分，也就是紅箭頭的向量。 ## Exercises ##### Exercise 2 令 $X$ 是已置中的 $N\times d$ 資料矩陣、而 $C = \frac{1}{N-1}X\trans X$ 為其共變異數矩陣。令 $\bv$ 為一 $\mathbb{R}^d$ 中的單位向量。說明 $\bu = X\bv$ 為將 $X$ 的所有樣本點投影在 $\bv$ 方向的值、並說明 $\bu$ 的變異數為 $\bv\trans C\bv$。 :::warning - [x] 把最後一句關於置中的部份拉到第一句，是因為已經置中過了，才知道變異數可以這樣算 - [x] 中英數之間空格 ::: $Ans:$ 因為 $\bu = X\bv$ 這筆資料已經過置中， $\bu$ 的變異數為 $\frac{1}{N-1}\bv\trans X\trans X\bv$ ，而 $C = \frac{1}{N-1}X\trans X$ ，得知 $\bu$ 的變異數為 $\bv\trans C\bv$ 。 ##### Exercise 3 ```python import numpy as np X = np.random.randn(3,4) u,s,vh = np.linalg.svd(X) vals,vecs = np.linalg.eigh(X.T.dot(X)) vals = vals[::-1] vecs = vecs[:,::-1] print("V =") print(vh.T) print("Q =") print(vecs) print("singular values:", s) print("eigenvalues:", vals) ``` ##### Exercise 3(a) 說明為什麼印出來的兩個矩陣會一樣。（除了有的行會有正負號的差別。） ![](https://i.imgur.com/8VCfSLh.png) :::warning - [x] vecs --> `vecs`, vh --> `vh` - [x] $^T$ --> $\trans$ - [x] eigenvector --> 特徵向量 ::: $Ans:$ `vecs` 是由 $X\trans X$ 的特徵向量組成的，而 `vh` 是經 SVD 而來的，是將 $X\trans X$ 的特徵向量當作行向量且確定已是正交後轉置。故將 `vh` 轉置後便會與 `vecs` 一樣。 ##### Exercise 3(b) 說明印出來的奇異值和特徵值有什麼關係。 $Ans:$ 奇異值是特徵根開根號，所以兩者有平方關係。 $\sigma_i = \sqrt{\lambda_i}$ for $i = 1,\ldots, r$ 且 $\lambda_i$ 大於 $0$。 ##### Exercise 3(c) 解釋 ```python vals = vals[::-1] vecs = vecs[:,::-1] ``` 的用處。 ![](https://i.imgur.com/haEaotU.png) :::warning - [x] 它具有倒過來的效果 --> 它具有把矩陣左右倒過來的效果 ::: $Ans:$ 它具有把矩陣左右倒過來的效果。 ##### Exercise 4 有時候我們只須要最大的幾個特徵值。所以如果把所有的特徵值都算出來是不必要的浪費時間。查找資源如何利用 SciPy 中的 `linalg.eigh` 找最大的幾個特徵值。 :::warning 再研究一下 [`scipy.linalg.eigh`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.eigh.html) 中的 `subset_by_index` 參數 Note: 講義中用的是 NumPy，它的 `linalg.eigh` 和 SciPy 中的 `linalg.eigh` 不一樣；後者功能較多。 ::: **Ans:** `linalg.eigh`是用來找出對稱矩陣的特徵值和特徵向量，原本要先找出共變異數矩陣，再利用`linalg.eigh`找出矩陣的特徵值，特徵值排列會是由小到大，所以要倒過來看，才能找到最大的那幾個特徵值。而利用`linalg.eigh`中的`subset_by_index` 可以直接找出排在第幾位的特徵值，假設把`subset_by_index`設定成 `[n-3, n-1]`，那我們就可以找出最大的三個特徵值。 ##### Exercise 5 給定 $k$ 時， $$ p = \frac{\sum_{i=1}^k \lambda_i}{\sum_{i=1}^d \lambda_i} $$ 稱作**變異數解釋率（explained variance ratio）**。當解釋率夠高的時候，表示投影後的資料足以代表原資料的重要特徵。觀察下圖，其橫軸為 $k$，縱軸為解釋率。判斷 $k$ 應該要取多少，以及為什麼。 ```python import numpy as np import matplotlib.pyplot as plt X = np.genfromtxt('hidden_text.csv', delimiter=',') print("shape of X =", X.shape) X = X - X.mean(axis=0) u,s,vh = np.linalg.svd(X) vals = s**2 plt.plot(np.arange(1,101), vals.cumsum() / vals.sum()) plt.gca().set_xlim(0,10) ``` 當 `seed = 0`， ![](https://i.imgur.com/wbn0ZTY.png) **Ans:** $k$ 應該要取 $2$，因為當 $k=2$，變異數解釋率(縱軸)接近 $1$，代表投影後的資料接近 $100$% 原資料的變異量，所以我們可降低數據維度到 $2$ 個主成分，就已經足夠代表$X$。 :::info Good ::: ##### Exercise 6 找一個共變異數矩陣 `cov` 使得 `np.random.multivariate_normal` 產出來的資料大約是一個楕圓形，且： - 長軸指向 $45^\circ$ 角、短軸指向 $135^\circ$ 角； - 長軸是大約是短軸的 $2$ 倍。 ```python import numpy as np import matplotlib.pyplot as plt mu = np.array([0,0]) cov = np.array([ [4,0], [0,1] ]) X = np.random.multivariate_normal(mu, cov, (1000,)) plt.axis('equal') plt.scatter(*X.T) ``` :::warning - [x] eigenvector --> 特徵向量 - [x] eigval --> 特徵值 - [x] 標點 ::: $Ans:$ 因為長軸和短軸分別指向 $45^\circ$ 角和 $135^\circ$ 角，所以兩個特徵向量會是 $[1/\sqrt{2},1/\sqrt{2}] , [-1/\sqrt{2},1/\sqrt{2}]$，且因為長軸為短軸的兩倍，所以可以得知特徵值 $\lambda_1$ 和 $\lambda_2$的關係為 $\sqrt{\lambda_1} = 2\sqrt{\lambda_2}$，因此我們可以假設 $\lambda_1 , \lambda_2 = 4,1$， $$A \begin{bmatrix} 1/\sqrt{2} & -1/\sqrt{2} \\ 1/\sqrt{2} & 1/\sqrt{2} \end{bmatrix} = \begin{bmatrix} 1/\sqrt{2} & -1/\sqrt{2} \\ 1/\sqrt{2} & 1/\sqrt{2} \end{bmatrix} \begin{bmatrix} 4 & 0 \\ 0 & 1 \end{bmatrix}， $$ 因此可得 $A = \begin{bmatrix} 5/2 & 3/2 \\ 3/2 & 5/2 \end{bmatrix}。$ ```python import numpy as np import matplotlib.pyplot as plt mu = np.array([0,0]) cov = np.array([ [5/2,3/2], [3/2,5/2] ]) X = np.random.multivariate_normal(mu, cov, (1000,)) plt.axis('equal') plt.scatter(*X.T) ``` ![](https://i.imgur.com/hfPH1zr.png) :::info 分數 = 7 :::

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.