拉普拉斯嵌入法與譜分群

# 拉普拉斯嵌入法與譜分群 Spectral embedding and clustering ![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png) This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/). $\newcommand{\trans}{^\top} \newcommand{\adj}{^{\rm adj}} \newcommand{\cof}{^{\rm cof}} \newcommand{\inp}[2]{\left\langle#1,#2\right\rangle} \newcommand{\dunion}{\mathbin{\dot\cup}} \newcommand{\bzero}{\mathbf{0}} \newcommand{\bone}{\mathbf{1}} \newcommand{\ba}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\bc}{\mathbf{c}} \newcommand{\bd}{\mathbf{d}} \newcommand{\be}{\mathbf{e}} \newcommand{\bh}{\mathbf{h}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bq}{\mathbf{q}} \newcommand{\br}{\mathbf{r}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\tr}{\operatorname{tr}} \newcommand{\nul}{\operatorname{null}} \newcommand{\rank}{\operatorname{rank}} %\newcommand{\ker}{\operatorname{ker}} \newcommand{\range}{\operatorname{range}} \newcommand{\Col}{\operatorname{Col}} \newcommand{\Row}{\operatorname{Row}} \newcommand{\spec}{\operatorname{spec}} \newcommand{\vspan}{\operatorname{span}} \newcommand{\Vol}{\operatorname{Vol}} \newcommand{\sgn}{\operatorname{sgn}} \newcommand{\idmap}{\operatorname{id}} \newcommand{\am}{\operatorname{am}} \newcommand{\gm}{\operatorname{gm}} \newcommand{\mult}{\operatorname{mult}} \newcommand{\iner}{\operatorname{iner}}$ ## Main idea Sometimes a graph can have some "clusters" such that the vertices in each cluster are strongly related, while vertices from different clusters are weakly related. The Laplacian matrix is good at find these clusters. Let $G$ be a graph on $n$ vertices and $L$ its Laplacian matrix. Let $\lambda_1 \leq \cdots \leq \lambda_n$ be the eigenvalues. We knew that $\nul(L)$ is the lagest $k$ such that $\lambda_k = 0$, and it is also the number of connected components of $G$. Similarly, we may pick a threshold $\epsilon > 0$ such that eigenvalues with $\lambda < \epsilon$ is considered as zeroish eigenvalues. Then the largest $k$ such that $\lambda_k < \epsilon$ can be interpreted as the number of clusters. Let $\{\bv_1,\ldots, \bv_n\}$ be the corresponding orthonormal eigenbasis. We knew that $\bv_1 = \bone$. The unit eigenvector $\bv_2$ corresponding to $\lambda_2$ is called the **Fiedler vector**. One may partition a graph by its Fiedler vector as follows: - Let $(\bv_2)_i$ be the $i$-th entry of the Fiedler vector $\bv_2$. - Let $N_+ = \{ i : (\bv_2)_i > 0\}$. - Let $N_- = \{ i : (\bv_2)_i < 0\}$. - Let $N_0 = \{ i : (\bv_2)_i = 0\}$. - Then the graph can be partitioned either by $(N_+\cup N_0, N_-)$ or $(N_+, N_-\cup N_0)$. Let $G$ be a graph and $W$ a subset of vertices. Then the **induced subgraph** of $G$ on $W$ is denoted by $G[W]$ and is the graph on the vertex set $W$ and with all edges in $G$ whose both endpoints are in $W$. ##### Fiedler's partition theorem (simple version) Let $G$ be a connected graph and $L$ its Laplacian matrix. Assume the multiplicity of the second smallest eigenvalue $\lambda_2$ is $1$. Let $\bv=(v_0,\ldots,v_{n-1})$ be the eigenvector of $L$ with respect to $\lambda_2$. Let $N_+ = \{ i : (\bv_2)_i > 0\}$. Let $N_- = \{ i : (\bv_2)_i < 0\}$. Let $N_0 = \{ i : (\bv_2)_i = 0\}$. Then $G[N_+]$ and $G[N_0\cup N_-]$ are *connected* subgraphs of $G$. From this point of view, the Fiedler vector assigns a number to each vertex, so it can be viewed as a feature vector. Indeed, one may use $\bv_3, \bv_4, \ldots$ as new feature vectors. This lead to the idea called the **spectral embedding** of a graph. ##### Algorithm (spectral embedding) Input: a graph $G$ on $n$ vertices, and the desired dimension $k$ Output: an $n\times k$ data matrix $Y$ 1. Let $L = L(G)$. 2. Find a diagonal matrix $D$, whose diagonal entries are $\lambda_1 \leq \cdots \leq \lambda_d$, and an orthogonal matrix $Q$ such that $L = QDQ\trans$. 3. Let $Y$ be the $n\times k$ matrix composed of the $2$-nd to the $(k+1)$-th columns of $Q$. The importance of spectral embedding is to provide a way to transform a graph data into a data matrix. Thus, one may use clustering algorithms, such as $k$-means, to cluster the vertices of the graph. The combination of spectral embedding and $k$-means is called the **spectral clustering** . Finally, all the theories in 614 and 615 works for weighted graphs, where each edge $\{i,j\}$ is assigned with a positive weight $w_{ij}$. The **weighted Laplacian matrix** of a weighted graph is having $-w_{ij}$ on the edges and $$ \sum_{ \substack{j \neq i\\\{j,i\}\in E(G)} } w_{ij}. $$ on the $i,i$-entry. ## Side stories - $k$-means - image segmentation ## Experiments ##### Exercise 1 執行以下程式碼。  Run the code below.  ```python ### code set_random_seed(0) print_ans = False k = choice([2,3,4]) sizes = [choice([4,5,6]) for i in range(k)] n = sum(sizes) psum = 0 cuts = [0] for s in sizes: psum += s cuts.append(psum) g = graphs.PathGraph(n) g.set_pos(None) for a,b in zip(cuts[:-1], cuts[1:]): for i in range(a, b - 1): for j in range(i + 1, b): g.add_edge(i,j) g.show(figsize=(5,5), title="$G$") L = g.laplacian_matrix() eigs = L.eigenvalues() eigs.sort() print("eigenvalues:", eigs) if print_ans: print("I would guess G has %s clusters."%k) ``` ##### Exercise 1(a) 觀察畫出來的圖，你認為 $G$ 有幾個群？  Observe the given graphs. Based on your intuition, how many clusters are there in $G$?  ##### Exercise 1(b) 觀察 $L(G)$ 的特徵值，你認為 $G$ 有幾群？  Observe the eigenvalues of $L(G)$. Based on your intuition, how many clusters are there in $G$?  :::info What do the experiments try to tell you? (open answer) ... ::: ## Exercises ##### Exercise 2 逐行解釋以下程式碼。  Explain the code below line by line.  ```python ### code import numpy as np from scipy.linalg import eigh g = graphs.PathGraph(11) L = np.array(g.laplacian_matrix()) eigs,vecs = eigh(L, eigvals=(1,1)) fvec = vecs.T[0] eps = 0.00001 Np = [i for i in g.vertices() if fvec[i] > eps] Nm = [i for i in g.vertices() if fvec[i] < -eps] Nz = [i for i in g.vertices() if abs(fvec[i]) < eps] g.show(vertex_colors={"red": Np, "green": Nm, "white": Nz}) ``` ##### Exercise 3 逐行解釋以下程式碼。  Explain the code below line by line.  ```python ### code import numpy as np from scipy.linalg import eigh import matplotlib.pyplot as plt %matplotlib inline g = graphs.CycleGraph(10) n = g.order() L = np.array(g.laplacian_matrix()) vals,vecs = eigh(L, subset_by_index=(1,2)) x,y = vecs.T plt.axis("equal") ### plot points plt.scatter(x, y, s=50, zorder=3) ### add vertex labels for i in range(n): plt.annotate(i, (x[i], y[i]), zorder=4) ### add lines for i,j in g.edges(labels=False): plt.plot([x[i],x[j]], [y[i],y[j]], 'c') ``` ##### Exercise 4 逐行解釋以下程式碼。  Explain the code below line by line.  ```python ### code import numpy as np from scipy.linalg import eigh import matplotlib.pyplot as plt from mpl_toolkits import mplot3d %matplotlib notebook g = graphs.CubeGraph(3) g.relabel() n = g.order() L = np.array(g.laplacian_matrix()) vals,vecs = eigh(L, subset_by_index=(1,3)) x,y,z = vecs.T ax = plt.axes(projection='3d') ax.set_xlim(-1,1) ax.set_ylim(-1,1) ax.set_zlim(-1,1) ### plot points ax.scatter(x, y, z, s=50, zorder=3) ### add vertex labels for i in range(n): ax.text(x[i], y[i], z[i], i, zorder=4) ### add lines for i,j in g.edges(labels=False): ax.plot([x[i],x[j]], [y[i],y[j]], [z[i],z[j]], 'c') ``` ##### Exercise 5 令 $G$ 為一 $n$ 個點的圖、$L = L(G)$。令 $Y$ 為一 $n\times k$ 矩陣，其各列為 $\bx_1, \ldots, \bx_n\in\mathbb{R}^k$。  Let $G$ be a graph on $n$ vertices and $L = L(G)$. Let $Y$ be an $n\times k$ matrix whose rows are $\bx_1, \ldots, \bx_n\in\mathbb{R}^k$.  ##### Exercise 5(a) 說明 $$ \tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2. $$  Show that $$ \tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2. $$  ##### Exercise 5(b) 說明 $\bone\trans Y = \bzero$ 表示 $\bx_1, \ldots, \bx_n$ 的中心點在原點。  Explain why $\bone\trans Y = \bzero$ means $\bx_1, \ldots, \bx_n$ are centered at the origin.  ##### Exercise 5(c) 說明 $Y\trans Y = I$ 表示 $Y$ 的各特徵向量（行向量）彼此互相垂直、而且各自的變異數均為 $1$。  Explain why $Y\trans Y = I$ means the feature vectors (columns) of $Y$ are mutually orthogonal and each has variance $1$.  ##### Exercise 5(d) 解釋拉普拉斯嵌入法做出來的 $Y$，就是以下最佳化問題的解： minimize $\tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2$, subject to $\bone\trans Y = \bzero$, $Y\trans Y = I$  Show that the $Y$ from the spectral embedding is the solution to the optimization problem: minimize $\tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2$, subject to $\bone\trans Y = \bzero$, $Y\trans Y = I$  ##### Exercise 6 先執行（讀入）筆記最下方的所有函數，並執行以下程式碼。查找資源解釋什麼是 $k$-means 分群演算法。  Run the code in the end of this note to load the functions, then run the code below. Search online to explain the $k$-means clustering algorithm.  ```python ### code %matplotlib inline X = make_blobs() y_new, centers = k_means(X, 3) plt.scatter(*X.T, c=y_new) ``` ##### Exercise 7 各種分群演算法都可以拿來將圖片的像素分群，整體的效果等同在將圖像分割。先執行（讀入）筆記最下方的所有函數，並執行以下程式碼。逐行解釋以下程式碼。  Any clustering algorithm can be used to partition the pixels of an image. And the effect of this is equivalent to image segmentation. Run the code in the end of this note to load the functions, then run the code below. Explain the code below line by line.  ```python ### original image import numpy as np from PIL import Image r = 10 img = Image.open('incrediville-side.jpg') x,y = img.size img = img.resize((x // r, y // r)) img ``` ```python ### image segmentation by the k-means clustering algorithm import matplotlib.pyplot as plt k = 3 arr = np.array(img, dtype=float) / 256 m,n,r = arr.shape N = m * n arr = arr.reshape(N, r) y_new, centers = k_means(arr, k) fig,axs = plt.subplots(1, 3, figsize=np.array([3*n,m])/100) for i in range(k): axs[i].axis('off') axs[i].imshow((y_new == i).reshape(m,n), cmap='binary') ``` ## 本次練習所須的函數 ```python import numpy as np import matplotlib.pyplot as plt ``` ```python def make_blobs(N=150, k=3, d=2, seed=None): """ Input: N: an integer, number of samples k: an integer, number of blobs d: an integer, dimension of the space Output: a dataset X of shape (N, d) """ np.random.seed(seed) X = np.random.randn(N,d) blob_size = N // k centers = np.random.randn(k, d) * 3 for i in range(k): left = blob_size * i right = blob_size * (i+1) if i != k-1 else N X[left:right] += centers[i] return X ``` ```python def dist_mtx(X, Y=None): """Return the distance matrix between rows of X and rows of Y Input: X: an array of shape (N,d) Y: an array of shape (M,d) if None, Y = X Output: the matrix [d_ij] where d_ij is the distance between the i-th row of X and the j-th row of Y """ if isinstance(Y, np.ndarray): pass elif Y == None: Y = X.copy() else: raise TypeError("Y should be a NumPy array or None") X_col = X[:, np.newaxis, :] Y_row = Y[np.newaxis, :, :] diff = X_col - Y_row dist = np.sqrt(np.sum(diff**2, axis=-1)) return dist ``` ```python def k_means(X, k, init="random"): """k-means clustering algorithm Input: X: an array of shape (N,d) rows for samples and columns for features k: number of clusters init: "random" or an array of shape (k,d) if "random", k points are chosen randomly from X as the initial cluster centers if an array, the array is used as the initial cluster centers Output: (y_new, centers) y_new: an array of shape (N,) that records the labels in (0, ..., k-1) of each sample centers: an array of shape (k,d) that records the cluster centers Example: mu = np.array([3,3]) cov = np.eye(2) X = np.vstack([np.random.multivariate_normal(mu, cov, 100), np.random.multivariate_normal(-mu, cov, 100)]) y_new,centers = k_means(X, 2) """ N,d = X.shape ### initialize y and center if isinstance(init, np.ndarray): centers = init.copy() elif init == "random": inds = np.random.choice(np.arange(N), k, replace=False) centers = X[inds, :] else: raise TypeError("init can only be a NumPy array or 'random'") dist = dist_mtx(X, centers) y_new = dist.argmin(axis=1) while True: ### compute the new centers for i in range(k): mask = (y_new == i) centers[i] = X[mask].mean(axis=0) ### generate the new y_new dist = dist_mtx(X, centers) y_last = y_new.copy() y_new = dist.argmin(axis=1) if np.all(y_last == y_new): break return y_new, centers ``` ```python ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.