# Notes on "A tutorial on spectral clustering" by Ulrike von Luxburg
[toc]
---
Reference: Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17, 395-416. [[PDF]](https://arxiv.org/pdf/0711.0189.pdf)
This note contains my understanding on the reference article.
## Objectives
---
## Implementation of Spectral Clustering
Here I use the numpy package to implement the spectral clustering algorithm for illustration purpose.
Import the packages:
```python=
import numpy as np
import plotly.express as px
from sklearn import datasets
from sklearn.cluster import KMeans
```
Define functions for calculating Gaussian similarities for a given data points.
```python=
def pairwise_distances(x: np.ndarray) -> np.ndarray:
return np.linalg.norm(x[:, np.newaxis, :] - x, ord=2, axis=-1)
def gaussian(x: np.ndarray, sigma: float) -> np.ndarray:
return np.exp(-x / (2 * sigma**2))
def gaussian_similarity(x: np.ndarray, sigma: float) -> np.ndarray:
return gaussian(pairwise_distances(x), sigma=sigma)
```
There are several similariy graphs that can be used in spectral clustering (see Section 2.2):
- The $\varepsilon$-neighborhood graph
- $k$-nearest neighbor graphs
- The fully connected graph
Generate the sample dataset
```python=
n_samples = 1000
x, y = datasets.make_moons(n_samples, noise=5e-2)
```
Illustrate the scatter plot for the dataset
```python=
px.scatter(x=x[..., 0], y=x[..., 1])
```
```python=
sigma = 0.1
sim_matrix = gaussian_similarity(x, sigma=sigma)
```
Calculate the `weighted_adjacency_matrix`, `degree_matrix`, and `graph_laplacian`
```python=
weighted_adjacency_matrix = np.where(sim_matrix > 1e-5, sim_matrix, 0)
degree_matrix = np.diag(weighted_adjacency_matrix.sum(axis=1))
graph_laplacian = np.eye(degree_matrix.shape[0]) - np.matmul(np.linalg.inv(degree_matrix), (weighted_adjacency_matrix))
```
```python=
n_clusters = 2
u, s, vh = np.linalg.svd(graph_laplacian)
truncated_u = u[:, -n_clusters:]
kmeans = KMeans(n_clusters=n_clusters, random_state=0)
kmeans.fit(truncated_u)
labels_ = kmeans.labels_
```
```python=
px.scatter(x=x[:, 0], y=x[:, 1], color=labels_.astype(str))
```
---
## Conclusion
---
<!--
## Implementation Code
I use the [NetworkX](https://github.com/networkx/networkx) library to demonstrate how the Spectral Clustering works.
First, I create a demo graph:
```python=
SEED = 1234
graph, pos = generate_graph(seed=SEED)
draw_graph(
graph=graph,
fname="example_graph.png",
pos=pos,
with_labels=True,
font_weight="bold"
)
```
`example_graph.png`:

Then, the Fiedler vector is computed using the `np.linalg.eigh` method:
```python=
L = nx.linalg.laplacian_matrix(graph).todense()
_, vecs = np.linalg.eigh(L)
fiedler_vec = np.array(vecs[:, 1]).reshape(-1,)
print(f"fiedler vector: {fiedler_vec}")
```
which will produce something like this:
```bash=
fiedler vector: [-0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072
-0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072 0.21320072
0.21320072 0.21320072 0.21320072 0.21320072 0.21320072 0.21320072
0.21320072 0.21320072 0.21320072 0.21320072]
```
And finally, the cluster members are determined with the help of the Fiedler vector.
`graph_partitioning.png`:

The full code is provided here:
```python=
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
def generate_graph(seed: int):
g1 = nx.complete_graph(2)
g2 = nx.complete_graph(11)
graph = nx.cartesian_product(g1, g2)
graph = nx.convert_node_labels_to_integers(graph)
pos = nx.spring_layout(graph, seed=seed)
return graph, pos
def draw_graph(graph: nx.Graph, fname: str, **kwargs):
plt.clf()
nx.draw(graph, **kwargs)
plt.savefig(fname)
if __name__ == "__main__":
SEED = 1234
graph, pos = generate_graph(seed=SEED)
draw_graph(
graph=graph,
fname="example_graph.png",
pos=pos,
with_labels=True,
font_weight="bold"
)
L = nx.linalg.laplacian_matrix(graph).todense()
_, vecs = np.linalg.eigh(L)
fiedler_vec = np.array(vecs[:, 1]).reshape(-1,)
print(f"fiedler vector: {fiedler_vec}")
nodes = np.array(graph.nodes)
cluster_0 = nodes[fiedler_vec < 0]
cluster_1 = nodes[fiedler_vec > 0]
print(f"cluster 0: {cluster_0}")
print(f"cluster 1: {cluster_1}")
color_map = []
for node in graph:
if node in cluster_0:
color_map.append("red")
else:
color_map.append("green")
draw_graph(
graph=graph,
fname="graph_partitioning.png",
pos=pos,
node_color=color_map
)
``` -->