體驗主成份分析

# 體驗主成份分析 Understanding the principal component analysis ![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png) This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/). $\newcommand{\trans}{^\top} \newcommand{\adj}{^{\rm adj}} \newcommand{\cof}{^{\rm cof}} \newcommand{\inp}[2]{\left\langle#1,#2\right\rangle} \newcommand{\dunion}{\mathbin{\dot\cup}} \newcommand{\bzero}{\mathbf{0}} \newcommand{\bone}{\mathbf{1}} \newcommand{\ba}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\bc}{\mathbf{c}} \newcommand{\bd}{\mathbf{d}} \newcommand{\be}{\mathbf{e}} \newcommand{\bh}{\mathbf{h}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bq}{\mathbf{q}} \newcommand{\br}{\mathbf{r}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\tr}{\operatorname{tr}} \newcommand{\nul}{\operatorname{null}} \newcommand{\rank}{\operatorname{rank}} %\newcommand{\ker}{\operatorname{ker}} \newcommand{\range}{\operatorname{range}} \newcommand{\Col}{\operatorname{Col}} \newcommand{\Row}{\operatorname{Row}} \newcommand{\spec}{\operatorname{spec}} \newcommand{\vspan}{\operatorname{span}} \newcommand{\Vol}{\operatorname{Vol}} \newcommand{\sgn}{\operatorname{sgn}} \newcommand{\idmap}{\operatorname{id}} \newcommand{\am}{\operatorname{am}} \newcommand{\gm}{\operatorname{gm}} \newcommand{\mult}{\operatorname{mult}} \newcommand{\iner}{\operatorname{iner}}$ ```python from lingeo import random_int_list ``` ## Main idea In statistics, data is often stored as a matrix (two-dimensional table) such that each row represent a sample, while each column represent a feature. For example, the rows can be the students and the columns can be the scores in different subjects. Let $X$ be an $N\times d$ matrix for some data and $\bx_1,\ldots,\bx_N\in\mathbb{R}^d$ its rows. Thus, $$ \mu = \frac{1}{N} \sum_{i = 1}^N \bx_i $$ is the center of these data points. One may **center** the data by creating a new matrix $X_0$ whose rows are $\bx_1 - \mu, \ldots, \bx_N - \mu$. Let $J$ be the $N\times N$ all-ones matrix. Then one way to describe this is $X_0 = (I - \frac{1}{N}J)X$. The **principal component analysis** (or **PCA**) is a method of finding the most important directions of the data. These directions (vectors) are called the **principal components** of the data. ##### Algorithm (principal component analysis) Input: a data represented by an $N\times d$ matrix $X$ and the desired number $k$ of principal components Output: the principal components represented by the column vectors of a matrix $P$ 1. Let $J$ be the $N\times N$ all-ones matrix. Define $X_0 = (I - \frac{1}{n}J)X$. 2. Let $U\Sigma V\trans$ be the singular value decomposition of $X_0$. 3. Let $P$ be the $d\times k$ matrix obtained by the first $k$ columns of $V$. Once the principal components are found. One may project all data points onto the plane spanned by the principal components. This can be done by the $d\times k$ matrix $Y = X_0P$, whose rows are the new data points contains the essential information of the original data. ## Side stories - hidden words ## Experiments ##### Exercise 1 執行以下程式碼。  Run the code below.  ```python ### code import numpy as np import matplotlib.pyplot as plt %matplotlib inline np.random.seed(0) print_ans = False mu = np.random.randint(-3,4, (2,)) cov = np.array([[1,1.9], [1.9,4]]) X = np.random.multivariate_normal(mu, cov, (20,)) mu = X.mean(axis=0) X0 = X - mu u,s,vh = np.linalg.svd(X0) P = vh.T[:,:1] plt.axis('equal') plt.scatter(*X.T) plt.scatter(*mu, c="red") plt.arrow(*mu, *(P[:,0]), head_width=0.3, color="red") xs = X0.dot(P) ys = np.zeros_like(xs) plt.scatter(xs, ys) pretty_print(LatexExpr(r"V^\top =")) print(vh) if print_ans: print("red point: center") print("red arrow: first principal component") print("orange points: projection of blue points onto the red arrow") print("red arrow = first row of VT = first column of V") ``` ##### Exercise 1(a) 若藍點為給定的資料。改變不同的 `seed`，說明紅點、紅箭頭、橘點的意思。  Let the blue point set be the data. Choose different `seed` and explain the meaning of the red point, the red arrow, and the orange point.  --- ##### Exercise 1(a) - answer here When `seed` = 0, we get the picture by running the code above as follows: ![](https://img.onl/DBtZhU) The red dot represents the center point of the blue dots. The red arrow indicates the distribution trend of the blue dots. The orange dots are projection of blue points onto the red arrow. --- ##### Exercise 1(b) 令 $X_0$ 的列向量為將藍點資料置中過後的資料點。若 $X_0 = U\Sigma V\trans$ 為其奇異值分解。說明紅箭頭與 $V\trans$ 的關係。  Let $X_0$ be the matrix whose rows are the blue points, centered at their mean. If $X_0 = U\Sigma V\trans$ is the singular value decomposition, describe the relation between the red arrow and $V\trans$.  --- ##### Exercise 1(b) - answer here The vector of the red arrow is the first row of $V\trans$. --- :::info What do the experiments try to tell you? (open answer) ... ::: ## Exercises ##### Exercise 2 令 $X$ 為一 $N\times d$ 矩陣，而其列向量為一筆資料。令 $J$ 為 $N\times N$ 的全一矩陣。  Let $X$ be an $N\times d$ matrix whose rows are the data points. Let $J$ be the $N\times N$ all-ones matrix.  ##### Exercise 2(a) 說明 $\frac{1}{N}JX$ 每列的意義。  Describe the meaning of the rows of $\frac{1}{N}JX$.  --- ##### Exercise 2(a) - answer here It means that we project the center onto the approriate plane while $X$ is a data. $\frac{1}{N}JX$ represents the result of performing matrix operations on the matrices $J$and $X$, and then scaling the result by $\frac{1}{N}$. :::warning Need revision: Each row of $\frac{1}{N}JX$ is the center of all data points in $X$. ::: --- ##### Exercise 2(b) 說明 $(I - \frac{1}{N}J)X$ 每列的意義。  Describe the meaning of the rows of $(I - \frac{1}{N}J)X$.  --- ##### Exercise 2(b) - answer here Let $$ X = \begin{bmatrix} - & \br_1 & - \\ ~ & \vdots & ~ \\ - & \br_n & - \end{bmatrix}. $$ By observing $(I - \frac{1}{N}J)X$, we can see that its $i$th row is, take $i=1$ as an example, $$ \frac{N-1}{N}\br_1 - \frac{1}{N}(\br_2 + \br_3 + ... + \br_n) = \frac{N}{N}\br_1 - \frac{1}{N}(\br_1 + \br_2 + ... + \br_n). $$ Thus, the $i$th row $(I - \frac{1}{N}J)X$ is equal to the $i$th row of $X$ minus the average of all the rows of $X$. The above manipulations centers the data. :::warning Need revision: The more precise way to say "manipulate" the center of the data is to "translate" the data so that the center is at the origin. ::: --- ##### Exercise 3 執行以下程式碼。  Run the code below.  ```python ### code import numpy as np np.random.seed(0) X = np.random.randint(-3, 4, (5,2)) mu = X.mean(axis=0) X0 = X - mu print("X =\n", X) print("mu =\n", mu) print("X0 =\n", X0) ``` ##### Exercise 3(a) 解釋 `mu = X.mean(axis=0)` 的意義。  Describe the meaning of `mu = X.mean(axis=0)` .  ##### Exercise 3(b) 解釋 `X0 = X - mu` 的意義。  Describe the meaning of `X0 = X - mu` .  ##### Exercise 4 執行以下程式碼。  Run the code below.  ```python ### code import numpy as np import matplotlib.pyplot as plt %matplotlib notebook np.random.seed(0) mu = np.random.randint(-3,4, (3,)) cov = np.array([[3, -0.9, -1.9], [-0.9, 11, -9.9], [-1.9, -9.9, 12]]) X = np.random.multivariate_normal(mu, cov, (100,)) mu = X.mean(axis=0) X0 = X - mu u,s,vh = np.linalg.svd(X0) P = vh.T[:,:2] ax = plt.axes(projection='3d') ax.set_xlim(-5,5) ax.set_ylim(-5,5) ax.set_zlim(-5,5) ax.scatter(*X.T) ax.scatter(*mu, c="red") ax.quiver(*(mu[:,np.newaxis] + np.zeros((2,))), *P, color="red") ``` ```python %matplotlib inline Y = X0.dot(P) plt.axis('equal') plt.scatter(*Y.T) ``` ##### Exercise 4(a) 說明兩段程式碼產出的圖的關係。  Describe the relation between the two graphs in the output.  ##### Exercise 4(b) 使用 `X.shape` 來觀察一個陣列的形狀、並逐句解釋下方程式碼所做的事情。 ```python mu = X.mean(axis=0) X0 = X - mu u,s,vh = np.linalg.svd(X0) P = vh.T[:,:2] ```  You may use `X.shape` to check the shape of an array. Then explain the meaning of each line in the code. ```python mu = X.mean(axis=0) X0 = X - mu u,s,vh = np.linalg.svd(X0) P = vh.T[:,:2] ```  ##### Exercise 5 **挑戰** 以下程式碼讀進一筆資料。這筆資料實際上是貼在高維度中的一個二維平面，且它在這個平面上排出一個一個英文字。請找出這個英文字是什麼。  **Challenge** The following code load some data from a file. The data falls on a $2$-dimensional subspace of a higher dimensional space, and the data points show an English word. Try to find out this word.  ```python X = np.genfromtxt('hidden_text.csv', delimiter=',') print("shape of X =", X.shape) ``` --- ##### Exercise 5 - answer here By running the code above, we can get the picture as follows: ![](https://img.onl/7gqdFJ) The word is the upside down of "Taiwan". :::warning Need the reasons and how you obtained the picture. ::: --- ##### Exercise 6 令 $X_0$ 及 $P$ 為主成份分析演算法中的矩陣。說明 $X_0P$ 與 $X_0PP\trans$ 的列向量所代表的意義。  Let $X_0$ and $P$ be the matrices in the principal component analysis. Explain the meanings of $X_0P$ and $X_0PP\trans$.  :::info collaboration: 2 3 problems: 1 moderator: 1 qc: 1 :::

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.