# Notes on "A tutorial on spectral clustering" by Ulrike von Luxburg [toc] --- Reference: Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17, 395-416. [[PDF]](https://arxiv.org/pdf/0711.0189.pdf) This note contains my understanding on the reference article. ## Objectives --- ## Implementation of Spectral Clustering Here I use the numpy package to implement the spectral clustering algorithm for illustration purpose. Import the packages: ```python= import numpy as np import plotly.express as px from sklearn import datasets from sklearn.cluster import KMeans ``` Define functions for calculating Gaussian similarities for a given data points. ```python= def pairwise_distances(x: np.ndarray) -> np.ndarray: return np.linalg.norm(x[:, np.newaxis, :] - x, ord=2, axis=-1) def gaussian(x: np.ndarray, sigma: float) -> np.ndarray: return np.exp(-x / (2 * sigma**2)) def gaussian_similarity(x: np.ndarray, sigma: float) -> np.ndarray: return gaussian(pairwise_distances(x), sigma=sigma) ``` There are several similariy graphs that can be used in spectral clustering (see Section 2.2): - The $\varepsilon$-neighborhood graph - $k$-nearest neighbor graphs - The fully connected graph Generate the sample dataset ```python= n_samples = 1000 x, y = datasets.make_moons(n_samples, noise=5e-2) ``` Illustrate the scatter plot for the dataset ```python= px.scatter(x=x[..., 0], y=x[..., 1]) ``` ```python= sigma = 0.1 sim_matrix = gaussian_similarity(x, sigma=sigma) ``` Calculate the `weighted_adjacency_matrix`, `degree_matrix`, and `graph_laplacian` ```python= weighted_adjacency_matrix = np.where(sim_matrix > 1e-5, sim_matrix, 0) degree_matrix = np.diag(weighted_adjacency_matrix.sum(axis=1)) graph_laplacian = np.eye(degree_matrix.shape[0]) - np.matmul(np.linalg.inv(degree_matrix), (weighted_adjacency_matrix)) ``` ```python= n_clusters = 2 u, s, vh = np.linalg.svd(graph_laplacian) truncated_u = u[:, -n_clusters:] kmeans = KMeans(n_clusters=n_clusters, random_state=0) kmeans.fit(truncated_u) labels_ = kmeans.labels_ ``` ```python= px.scatter(x=x[:, 0], y=x[:, 1], color=labels_.astype(str)) ``` --- ## Conclusion --- <!-- ## Implementation Code I use the [NetworkX](https://github.com/networkx/networkx) library to demonstrate how the Spectral Clustering works. First, I create a demo graph: ```python= SEED = 1234 graph, pos = generate_graph(seed=SEED) draw_graph( graph=graph, fname="example_graph.png", pos=pos, with_labels=True, font_weight="bold" ) ``` `example_graph.png`: ![](https://i.imgur.com/G0sengM.png) Then, the Fiedler vector is computed using the `np.linalg.eigh` method: ```python= L = nx.linalg.laplacian_matrix(graph).todense() _, vecs = np.linalg.eigh(L) fiedler_vec = np.array(vecs[:, 1]).reshape(-1,) print(f"fiedler vector: {fiedler_vec}") ``` which will produce something like this: ```bash= fiedler vector: [-0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072 -0.21320072 0.21320072 0.21320072 0.21320072 0.21320072 0.21320072 0.21320072 0.21320072 0.21320072 0.21320072 0.21320072 0.21320072] ``` And finally, the cluster members are determined with the help of the Fiedler vector. `graph_partitioning.png`: ![](https://i.imgur.com/1EPtu3Z.png) The full code is provided here: ```python= import matplotlib.pyplot as plt import networkx as nx import numpy as np def generate_graph(seed: int): g1 = nx.complete_graph(2) g2 = nx.complete_graph(11) graph = nx.cartesian_product(g1, g2) graph = nx.convert_node_labels_to_integers(graph) pos = nx.spring_layout(graph, seed=seed) return graph, pos def draw_graph(graph: nx.Graph, fname: str, **kwargs): plt.clf() nx.draw(graph, **kwargs) plt.savefig(fname) if __name__ == "__main__": SEED = 1234 graph, pos = generate_graph(seed=SEED) draw_graph( graph=graph, fname="example_graph.png", pos=pos, with_labels=True, font_weight="bold" ) L = nx.linalg.laplacian_matrix(graph).todense() _, vecs = np.linalg.eigh(L) fiedler_vec = np.array(vecs[:, 1]).reshape(-1,) print(f"fiedler vector: {fiedler_vec}") nodes = np.array(graph.nodes) cluster_0 = nodes[fiedler_vec < 0] cluster_1 = nodes[fiedler_vec > 0] print(f"cluster 0: {cluster_0}") print(f"cluster 1: {cluster_1}") color_map = [] for node in graph: if node in cluster_0: color_map.append("red") else: color_map.append("green") draw_graph( graph=graph, fname="graph_partitioning.png", pos=pos, node_color=color_map ) ``` -->