---
# System prepended metadata

title: Notes on "A tutorial on spectral clustering" by Ulrike von Luxburg
tags: [clustering, spectral-clustering, spectral-graph-theory]

---

# Notes on "A tutorial on spectral clustering" by Ulrike von Luxburg

[toc]

---

Reference: Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17, 395-416. [[PDF]](https://arxiv.org/pdf/0711.0189.pdf)

This note contains my understanding on the reference article.

## Objectives

---

## Implementation of Spectral Clustering

Here I use the numpy package to implement the spectral clustering algorithm for illustration purpose.

Import the packages:

```python=
import numpy as np
import plotly.express as px
from sklearn import datasets
from sklearn.cluster import KMeans
```

Define functions for calculating Gaussian similarities for a given data points.

```python=
def pairwise_distances(x: np.ndarray) -> np.ndarray:
    return np.linalg.norm(x[:, np.newaxis, :] - x, ord=2, axis=-1)


def gaussian(x: np.ndarray, sigma: float) -> np.ndarray:
    return np.exp(-x / (2 * sigma**2))


def gaussian_similarity(x: np.ndarray, sigma: float) -> np.ndarray:
    return gaussian(pairwise_distances(x), sigma=sigma)
```

There are several similariy graphs that can be used in spectral clustering (see Section 2.2):
- The $\varepsilon$-neighborhood graph
- $k$-nearest neighbor graphs
- The fully connected graph

Generate the sample dataset

```python=
n_samples = 1000
x, y = datasets.make_moons(n_samples, noise=5e-2)
```

Illustrate the scatter plot for the dataset

```python=
px.scatter(x=x[..., 0], y=x[..., 1])
```

```python=
sigma = 0.1
sim_matrix = gaussian_similarity(x, sigma=sigma)
```

Calculate the `weighted_adjacency_matrix`, `degree_matrix`, and `graph_laplacian`

```python=
weighted_adjacency_matrix = np.where(sim_matrix > 1e-5, sim_matrix, 0)
degree_matrix = np.diag(weighted_adjacency_matrix.sum(axis=1))
graph_laplacian = np.eye(degree_matrix.shape[0]) - np.matmul(np.linalg.inv(degree_matrix), (weighted_adjacency_matrix))
```

```python=
n_clusters = 2
u, s, vh = np.linalg.svd(graph_laplacian)
truncated_u = u[:, -n_clusters:]

kmeans = KMeans(n_clusters=n_clusters, random_state=0)
kmeans.fit(truncated_u)

labels_ = kmeans.labels_
```

```python=
px.scatter(x=x[:, 0], y=x[:, 1], color=labels_.astype(str))
```

---

## Conclusion


---
<!-- 
## Implementation Code

I use the [NetworkX](https://github.com/networkx/networkx) library to demonstrate how the Spectral Clustering works.

First, I create a demo graph:

```python=
SEED = 1234
graph, pos = generate_graph(seed=SEED)
draw_graph(
  graph=graph,
  fname="example_graph.png",
  pos=pos,
  with_labels=True,
  font_weight="bold"
)
```

`example_graph.png`:

![](https://i.imgur.com/G0sengM.png)

Then, the Fiedler vector is computed using the `np.linalg.eigh` method:

```python=
L = nx.linalg.laplacian_matrix(graph).todense()
_, vecs = np.linalg.eigh(L)
fiedler_vec = np.array(vecs[:, 1]).reshape(-1,)
print(f"fiedler vector: {fiedler_vec}")
```

which will produce something like this:
```bash=
fiedler vector: [-0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072
 -0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072  0.21320072
  0.21320072  0.21320072  0.21320072  0.21320072  0.21320072  0.21320072
  0.21320072  0.21320072  0.21320072  0.21320072]
```

And finally, the cluster members are determined with the help of the Fiedler vector.

`graph_partitioning.png`:

![](https://i.imgur.com/1EPtu3Z.png)

The full code is provided here:

```python=
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np

def generate_graph(seed: int):
  g1 = nx.complete_graph(2)
  g2 = nx.complete_graph(11)
  graph = nx.cartesian_product(g1, g2)
  graph = nx.convert_node_labels_to_integers(graph)
  pos = nx.spring_layout(graph, seed=seed)
  return graph, pos

def draw_graph(graph: nx.Graph, fname: str, **kwargs):
  plt.clf()
  nx.draw(graph, **kwargs)
  plt.savefig(fname)


if __name__ == "__main__":
  SEED = 1234
  graph, pos = generate_graph(seed=SEED)
  draw_graph(
    graph=graph,
    fname="example_graph.png",
    pos=pos,
    with_labels=True,
    font_weight="bold"
  )

  L = nx.linalg.laplacian_matrix(graph).todense()
  _, vecs = np.linalg.eigh(L)
  fiedler_vec = np.array(vecs[:, 1]).reshape(-1,)
  print(f"fiedler vector: {fiedler_vec}")

  nodes = np.array(graph.nodes)
  cluster_0 = nodes[fiedler_vec < 0]
  cluster_1 = nodes[fiedler_vec > 0]
  print(f"cluster 0: {cluster_0}")
  print(f"cluster 1: {cluster_1}")
  color_map = []

  for node in graph:
    if node in cluster_0:
      color_map.append("red")
    else:
      color_map.append("green")

  draw_graph(
    graph=graph,
    fname="graph_partitioning.png",
    pos=pos,
    node_color=color_map
  )
``` -->