Spectral Clustering: How Eigenvectors Reveal Complex Cluster Structures

🔵 K-means fails with curved clusters. Spectral Clustering solves them using eigenvectors of the similarity graph.

The problem with K-means:

On moon-shaped data (or any non-linear structure), K-means incorrectly groups points because it assumes spherical clusters.

The Spectral Clustering workflow:

Dataset → Similarity Graph → Laplacian → Eigenvectors → Clusters

The 7 steps:

Get data
Similarity matrix (W) → measures similarity between each pair of points (Gaussian kernel with gamma parameter)
Degree matrix (D) → diagonal: sum of similarities of each point with all others
Laplacian (L = D − W) → captures the structure of the graph
Eigendecomposition of L → eigenvalues, eigenvectors = np.linalg.eigh(L)
Select the k smallest eigenvectors → correspond to k clusters (eigengap heuristic: pick where the largest jump occurs between successive eigenvalues)
Apply K-means on the spectral embedding space → now clusters are linearly separable

Why it works:

The eigenvectors of the Laplacian transform data into a new space where strongly-connected points end up close together. K-means works there, even when it fails in the original space.

💡 Explanation in a nutshell
#

Spectral Clustering doesn’t group by Euclidean distance but by connectivity in a similarity graph. The elegant trick: the Laplacian’s eigenvectors act as new features where complex structures become linearly separable. The combination of linear algebra + K-means allows detecting clusters that no distance-based method could find.

Spectral Clustering Explained: How Eigenvectors Reveal Complex Cluster Structures | Towards Data Science

Understanding why spectral clustering outperforms K-means

towardsdatascience.com ↗

Also published on LinkedIn.

Author

Juan Pedro Bretti Mandarano

💡 Explanation in a nutshell#

Spectral Clustering Explained: How Eigenvectors Reveal Complex Cluster Structures | Towards Data Science

💡 Explanation in a nutshell
#