Notes on "A tutorial on spectral clustering" by Ulrike von Luxburg

# Notes on "A tutorial on spectral clustering" by Ulrike von Luxburg [toc] --- Reference: Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17, 395-416. [[PDF]](https://arxiv.org/pdf/0711.0189.pdf) This note contains my understanding on the reference article. ## Objectives --- ## Implementation of Spectral Clustering Here I use the numpy package to implement the spectral clustering algorithm for illustration purpose. Import the packages: ```python= import numpy as np import plotly.express as px from sklearn import datasets from sklearn.cluster import KMeans ``` Define functions for calculating Gaussian similarities for a given data points. ```python= def pairwise_distances(x: np.ndarray) -> np.ndarray: return np.linalg.norm(x[:, np.newaxis, :] - x, ord=2, axis=-1) def gaussian(x: np.ndarray, sigma: float) -> np.ndarray: return np.exp(-x / (2 * sigma**2)) def gaussian_similarity(x: np.ndarray, sigma: float) -> np.ndarray: return gaussian(pairwise_distances(x), sigma=sigma) ``` There are several similariy graphs that can be used in spectral clustering (see Section 2.2): - The $\varepsilon$-neighborhood graph - $k$-nearest neighbor graphs - The fully connected graph Generate the sample dataset ```python= n_samples = 1000 x, y = datasets.make_moons(n_samples, noise=5e-2) ``` Illustrate the scatter plot for the dataset ```python= px.scatter(x=x[..., 0], y=x[..., 1]) ``` ```python= sigma = 0.1 sim_matrix = gaussian_similarity(x, sigma=sigma) ``` Calculate the `weighted_adjacency_matrix`, `degree_matrix`, and `graph_laplacian` ```python= weighted_adjacency_matrix = np.where(sim_matrix > 1e-5, sim_matrix, 0) degree_matrix = np.diag(weighted_adjacency_matrix.sum(axis=1)) graph_laplacian = np.eye(degree_matrix.shape[0]) - np.matmul(np.linalg.inv(degree_matrix), (weighted_adjacency_matrix)) ``` ```python= n_clusters = 2 u, s, vh = np.linalg.svd(graph_laplacian) truncated_u = u[:, -n_clusters:] kmeans = KMeans(n_clusters=n_clusters, random_state=0) kmeans.fit(truncated_u) labels_ = kmeans.labels_ ``` ```python= px.scatter(x=x[:, 0], y=x[:, 1], color=labels_.astype(str)) ``` --- ## Conclusion ---