# 拉普拉斯嵌入法與譜分群
Spectral embedding and clustering

This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).
$\newcommand{\trans}{^\top}
\newcommand{\adj}{^{\rm adj}}
\newcommand{\cof}{^{\rm cof}}
\newcommand{\inp}[2]{\left\langle#1,#2\right\rangle}
\newcommand{\dunion}{\mathbin{\dot\cup}}
\newcommand{\bzero}{\mathbf{0}}
\newcommand{\bone}{\mathbf{1}}
\newcommand{\ba}{\mathbf{a}}
\newcommand{\bb}{\mathbf{b}}
\newcommand{\bc}{\mathbf{c}}
\newcommand{\bd}{\mathbf{d}}
\newcommand{\be}{\mathbf{e}}
\newcommand{\bh}{\mathbf{h}}
\newcommand{\bp}{\mathbf{p}}
\newcommand{\bq}{\mathbf{q}}
\newcommand{\br}{\mathbf{r}}
\newcommand{\bx}{\mathbf{x}}
\newcommand{\by}{\mathbf{y}}
\newcommand{\bz}{\mathbf{z}}
\newcommand{\bu}{\mathbf{u}}
\newcommand{\bv}{\mathbf{v}}
\newcommand{\bw}{\mathbf{w}}
\newcommand{\tr}{\operatorname{tr}}
\newcommand{\nul}{\operatorname{null}}
\newcommand{\rank}{\operatorname{rank}}
%\newcommand{\ker}{\operatorname{ker}}
\newcommand{\range}{\operatorname{range}}
\newcommand{\Col}{\operatorname{Col}}
\newcommand{\Row}{\operatorname{Row}}
\newcommand{\spec}{\operatorname{spec}}
\newcommand{\vspan}{\operatorname{span}}
\newcommand{\Vol}{\operatorname{Vol}}
\newcommand{\sgn}{\operatorname{sgn}}
\newcommand{\idmap}{\operatorname{id}}
\newcommand{\am}{\operatorname{am}}
\newcommand{\gm}{\operatorname{gm}}
\newcommand{\mult}{\operatorname{mult}}
\newcommand{\iner}{\operatorname{iner}}$
## Main idea
Sometimes a graph can have some "clusters" such that the vertices in each cluster are strongly related, while vertices from different clusters are weakly related.
The Laplacian matrix is good at find these clusters.
Let $G$ be a graph on $n$ vertices and $L$ its Laplacian matrix.
Let $\lambda_1 \leq \cdots \leq \lambda_n$ be the eigenvalues.
We knew that $\nul(L)$ is the lagest $k$ such that $\lambda_k = 0$, and it is also the number of connected components of $G$.
Similarly, we may pick a threshold $\epsilon > 0$ such that eigenvalues with $\lambda < \epsilon$ is considered as zeroish eigenvalues.
Then the largest $k$ such that $\lambda_k < \epsilon$ can be interpreted as the number of clusters.
Let $\{\bv_1,\ldots, \bv_n\}$ be the corresponding orthonormal eigenbasis.
We knew that $\bv_1 = \bone$.
The unit eigenvector $\bv_2$ corresponding to $\lambda_2$ is called the **Fiedler vector**.
One may partition a graph by its Fiedler vector as follows:
- Let $(\bv_2)_i$ be the $i$-th entry of the Fiedler vector $\bv_2$.
- Let $N_+ = \{ i : (\bv_2)_i > 0\}$.
- Let $N_- = \{ i : (\bv_2)_i < 0\}$.
- Let $N_0 = \{ i : (\bv_2)_i = 0\}$.
- Then the graph can be partitioned either by $(N_+\cup N_0, N_-)$ or $(N_+, N_-\cup N_0)$.
Let $G$ be a graph and $W$ a subset of vertices.
Then the **induced subgraph** of $G$ on $W$ is denoted by $G[W]$ and is the graph on the vertex set $W$ and with all edges in $G$ whose both endpoints are in $W$.
##### Fiedler's partition theorem (simple version)
Let $G$ be a connected graph and $L$ its Laplacian matrix.
Assume the multiplicity of the second smallest eigenvalue $\lambda_2$ is $1$.
Let $\bv=(v_0,\ldots,v_{n-1})$ be the eigenvector of $L$ with respect to $\lambda_2$.
Let $N_+ = \{ i : (\bv_2)_i > 0\}$.
Let $N_- = \{ i : (\bv_2)_i < 0\}$.
Let $N_0 = \{ i : (\bv_2)_i = 0\}$.
Then $G[N_+]$ and $G[N_0\cup N_-]$ are *connected* subgraphs of $G$.
From this point of view, the Fiedler vector assigns a number to each vertex, so it can be viewed as a feature vector.
Indeed, one may use $\bv_3, \bv_4, \ldots$ as new feature vectors.
This lead to the idea called the **spectral embedding** of a graph.
##### Algorithm (spectral embedding)
Input: a graph $G$ on $n$ vertices, and the desired dimension $k$
Output: an $n\times k$ data matrix $Y$
1. Let $L = L(G)$.
2. Find a diagonal matrix $D$, whose diagonal entries are $\lambda_1 \leq \cdots \leq \lambda_d$, and an orthogonal matrix $Q$ such that $L = QDQ\trans$.
3. Let $Y$ be the $n\times k$ matrix composed of the $2$-nd to the $(k+1)$-th columns of $Q$.
The importance of spectral embedding is to provide a way to transform a graph data into a data matrix.
Thus, one may use clustering algorithms, such as $k$-means, to cluster the vertices of the graph.
The combination of spectral embedding and $k$-means is called the **spectral clustering** .
Finally, all the theories in 614 and 615 works for weighted graphs, where each edge $\{i,j\}$ is assigned with a positive weight $w_{ij}$.
The **weighted Laplacian matrix** of a weighted graph is having $-w_{ij}$ on the edges and
$$
\sum_{ \substack{j \neq i\\\{j,i\}\in E(G)} } w_{ij}.
$$
on the $i,i$-entry.
## Side stories
- $k$-means
- image segmentation
## Experiments
##### Exercise 1
執行以下程式碼。
<!-- eng start -->
Run the code below.
<!-- eng end -->
```python
### code
set_random_seed(0)
print_ans = False
k = choice([2,3,4])
sizes = [choice([4,5,6]) for i in range(k)]
n = sum(sizes)
psum = 0
cuts = [0]
for s in sizes:
psum += s
cuts.append(psum)
g = graphs.PathGraph(n)
g.set_pos(None)
for a,b in zip(cuts[:-1], cuts[1:]):
for i in range(a, b - 1):
for j in range(i + 1, b):
g.add_edge(i,j)
g.show(figsize=(5,5), title="$G$")
L = g.laplacian_matrix()
eigs = L.eigenvalues()
eigs.sort()
print("eigenvalues:", eigs)
if print_ans:
print("I would guess G has %s clusters."%k)
```
##### Exercise 1(a)
觀察畫出來的圖,你認為 $G$ 有幾個群?
<!-- eng start -->
Observe the given graphs. Based on your intuition, how many clusters are there in $G$?
<!-- eng end -->
##### Exercise 1(b)
觀察 $L(G)$ 的特徵值,你認為 $G$ 有幾群?
<!-- eng start -->
Observe the eigenvalues of $L(G)$. Based on your intuition, how many clusters are there in $G$?
<!-- eng end -->
:::info
What do the experiments try to tell you? (open answer)
...
:::
## Exercises
##### Exercise 2
逐行解釋以下程式碼。
<!-- eng start -->
Explain the code below line by line.
<!-- eng end -->
```python
### code
import numpy as np
from scipy.linalg import eigh
g = graphs.PathGraph(11)
L = np.array(g.laplacian_matrix())
eigs,vecs = eigh(L, eigvals=(1,1))
fvec = vecs.T[0]
eps = 0.00001
Np = [i for i in g.vertices() if fvec[i] > eps]
Nm = [i for i in g.vertices() if fvec[i] < -eps]
Nz = [i for i in g.vertices() if abs(fvec[i]) < eps]
g.show(vertex_colors={"red": Np, "green": Nm, "white": Nz})
```
##### Exercise 3
逐行解釋以下程式碼。
<!-- eng start -->
Explain the code below line by line.
<!-- eng end -->
```python
### code
import numpy as np
from scipy.linalg import eigh
import matplotlib.pyplot as plt
%matplotlib inline
g = graphs.CycleGraph(10)
n = g.order()
L = np.array(g.laplacian_matrix())
vals,vecs = eigh(L, subset_by_index=(1,2))
x,y = vecs.T
plt.axis("equal")
### plot points
plt.scatter(x, y, s=50, zorder=3)
### add vertex labels
for i in range(n):
plt.annotate(i, (x[i], y[i]), zorder=4)
### add lines
for i,j in g.edges(labels=False):
plt.plot([x[i],x[j]], [y[i],y[j]], 'c')
```
##### Exercise 4
逐行解釋以下程式碼。
<!-- eng start -->
Explain the code below line by line.
<!-- eng end -->
```python
### code
import numpy as np
from scipy.linalg import eigh
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
%matplotlib notebook
g = graphs.CubeGraph(3)
g.relabel()
n = g.order()
L = np.array(g.laplacian_matrix())
vals,vecs = eigh(L, subset_by_index=(1,3))
x,y,z = vecs.T
ax = plt.axes(projection='3d')
ax.set_xlim(-1,1)
ax.set_ylim(-1,1)
ax.set_zlim(-1,1)
### plot points
ax.scatter(x, y, z, s=50, zorder=3)
### add vertex labels
for i in range(n):
ax.text(x[i], y[i], z[i], i, zorder=4)
### add lines
for i,j in g.edges(labels=False):
ax.plot([x[i],x[j]], [y[i],y[j]], [z[i],z[j]], 'c')
```
##### Exercise 5
令 $G$ 為一 $n$ 個點的圖、$L = L(G)$。
令 $Y$ 為一 $n\times k$ 矩陣,其各列為 $\bx_1, \ldots, \bx_n\in\mathbb{R}^k$。
<!-- eng start -->
Let $G$ be a graph on $n$ vertices and $L = L(G)$. Let $Y$ be an $n\times k$ matrix whose rows are $\bx_1, \ldots, \bx_n\in\mathbb{R}^k$.
<!-- eng end -->
##### Exercise 5(a)
說明
$$
\tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2.
$$
<!-- eng start -->
Show that
$$
\tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2.
$$
<!-- eng end -->
##### Exercise 5(b)
說明 $\bone\trans Y = \bzero$ 表示 $\bx_1, \ldots, \bx_n$ 的中心點在原點。
<!-- eng start -->
Explain why $\bone\trans Y = \bzero$ means $\bx_1, \ldots, \bx_n$ are centered at the origin.
<!-- eng end -->
##### Exercise 5(c)
說明 $Y\trans Y = I$ 表示 $Y$ 的各特徵向量(行向量)
彼此互相垂直、而且各自的變異數均為 $1$。
<!-- eng start -->
Explain why $Y\trans Y = I$ means the feature vectors (columns) of $Y$ are mutually orthogonal and each has variance $1$.
<!-- eng end -->
##### Exercise 5(d)
解釋拉普拉斯嵌入法做出來的 $Y$,就是以下最佳化問題的解:
minimize $\tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2$,
subject to $\bone\trans Y = \bzero$, $Y\trans Y = I$
<!-- eng start -->
Show that the $Y$ from the spectral embedding is the solution to the optimization problem:
minimize $\tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2$,
subject to $\bone\trans Y = \bzero$, $Y\trans Y = I$
<!-- eng end -->
##### Exercise 6
先執行(讀入)筆記最下方的所有函數,並執行以下程式碼。
查找資源解釋什麼是 $k$-means 分群演算法。
<!-- eng start -->
Run the code in the end of this note to load the functions, then run the code below. Search online to explain the $k$-means clustering algorithm.
<!-- eng end -->
```python
### code
%matplotlib inline
X = make_blobs()
y_new, centers = k_means(X, 3)
plt.scatter(*X.T, c=y_new)
```
##### Exercise 7
各種分群演算法都可以拿來將圖片的像素分群,整體的效果等同在將圖像分割。
先執行(讀入)筆記最下方的所有函數,並執行以下程式碼。
逐行解釋以下程式碼。
<!-- eng start -->
Any clustering algorithm can be used to partition the pixels of an image. And the effect of this is equivalent to image segmentation. Run the code in the end of this note to load the functions, then run the code below. Explain the code below line by line.
<!-- eng end -->
```python
### original image
import numpy as np
from PIL import Image
r = 10
img = Image.open('incrediville-side.jpg')
x,y = img.size
img = img.resize((x // r, y // r))
img
```
```python
### image segmentation by the k-means clustering algorithm
import matplotlib.pyplot as plt
k = 3
arr = np.array(img, dtype=float) / 256
m,n,r = arr.shape
N = m * n
arr = arr.reshape(N, r)
y_new, centers = k_means(arr, k)
fig,axs = plt.subplots(1, 3, figsize=np.array([3*n,m])/100)
for i in range(k):
axs[i].axis('off')
axs[i].imshow((y_new == i).reshape(m,n), cmap='binary')
```
## 本次練習所須的函數
```python
import numpy as np
import matplotlib.pyplot as plt
```
```python
def make_blobs(N=150, k=3, d=2, seed=None):
"""
Input:
N: an integer, number of samples
k: an integer, number of blobs
d: an integer, dimension of the space
Output:
a dataset X of shape (N, d)
"""
np.random.seed(seed)
X = np.random.randn(N,d)
blob_size = N // k
centers = np.random.randn(k, d) * 3
for i in range(k):
left = blob_size * i
right = blob_size * (i+1) if i != k-1 else N
X[left:right] += centers[i]
return X
```
```python
def dist_mtx(X, Y=None):
"""Return the distance matrix between rows of X and rows of Y
Input:
X: an array of shape (N,d)
Y: an array of shape (M,d)
if None, Y = X
Output:
the matrix [d_ij] where d_ij is the distance between
the i-th row of X and the j-th row of Y
"""
if isinstance(Y, np.ndarray):
pass
elif Y == None:
Y = X.copy()
else:
raise TypeError("Y should be a NumPy array or None")
X_col = X[:, np.newaxis, :]
Y_row = Y[np.newaxis, :, :]
diff = X_col - Y_row
dist = np.sqrt(np.sum(diff**2, axis=-1))
return dist
```
```python
def k_means(X, k, init="random"):
"""k-means clustering algorithm
Input:
X: an array of shape (N,d)
rows for samples and columns for features
k: number of clusters
init: "random" or an array of shape (k,d)
if "random", k points are chosen randomly from X as the initial cluster centers
if an array, the array is used as the initial cluster centers
Output:
(y_new, centers)
y_new: an array of shape (N,)
that records the labels in (0, ..., k-1) of each sample
centers: an array of shape (k,d)
that records the cluster centers
Example:
mu = np.array([3,3])
cov = np.eye(2)
X = np.vstack([np.random.multivariate_normal(mu, cov, 100),
np.random.multivariate_normal(-mu, cov, 100)])
y_new,centers = k_means(X, 2)
"""
N,d = X.shape
### initialize y and center
if isinstance(init, np.ndarray):
centers = init.copy()
elif init == "random":
inds = np.random.choice(np.arange(N), k, replace=False)
centers = X[inds, :]
else:
raise TypeError("init can only be a NumPy array or 'random'")
dist = dist_mtx(X, centers)
y_new = dist.argmin(axis=1)
while True:
### compute the new centers
for i in range(k):
mask = (y_new == i)
centers[i] = X[mask].mean(axis=0)
### generate the new y_new
dist = dist_mtx(X, centers)
y_last = y_new.copy()
y_new = dist.argmin(axis=1)
if np.all(y_last == y_new):
break
return y_new, centers
```
```python
```