Spectral Clustering
Here we study the important class of spectral methods for understanding networks on a global level. By “spectral” we mean the spectrum, or eigenvalues, of matrices derived from graphs, which will give us insight into the structure of the graphs themselves. In particular, we will explore spectral clustering algorithms, which take advantage of these tools for clustering nodes in graphs.
The spectral clustering algorithms we will explore generally consist of three basic stages.
- Preprocessing: construct a matrix representation of a graph, such as the adjacency matrix (but we will explore other options)
- Decomposition: compute the eigenvectors and eigenvalues of the matrix, and use these to create a low-dimensional representation space
- Grouping: assign points to clusters based on their representation in this space
Graph Partitioning
Let’s formalize the task we would like to solve. We start out with an undirected graph . Our goal is to partition into two disjoint groups (so and ) in a way that maximizes the number of connections internal to the groups and minimizes the number of connections between the two groups.
To further formalize the objective, let’s introduce some terminology:
- Cut: how much connection there is between two disjoint sets of nodes. where is the weight of the edge between nodes and .
- Minimum cut:
Since we want to minimize the number of connections between and , we might decide to make the minimum cut our objective. However, we find that we end up with very unintuitive clusters this way – we can often simply set to be a single node with very few outgoing connections, and to be the rest of the network, to get a very small cut. What we need is a measure that also considers internal cluster connectivity.
Enter the conductance, which balances between-group and within-group connectivity concerns. We define where , the total (weighted) degree of the nodes in . We can roughly think of conductance as analogous to a surface area to volume ratio: the numerator is the area of the shared surface between and , and the denominator measures volume while trying to ensure and have similar volumes. Because of this nuanced measure, picking and to minimize the conductance results in more balanced partitions than minimizing the cut. The challenge then becomes to efficiently find a good partition, since minimizing conductance is NP-hard.
Spectral Graph Partitioning
Enter spectral graph partitioning, a method that will allow us to pin down the conductance using eigenvectors. We’ll start by introducing some basic techniques in spectral graph theory.
The goal of spectral graph theory is to analyze the “spectrum” of matrices representing graphs. By spectrum we mean the set of eigenvalues of a matrix representing a graph, in order of their magnitudes, along with their corresponding eigenvalues. For example, the largest eigenvector/eigenvalue pair for the adjacency matrix of a d-regular graph is the all-ones vector , with eigenvalue . Exercise: what are some eigenvectors for a disconnected graph with two components, each component d-regular? Note that by the spectral theorem, the adjacency matrix (which is real and symmetric) has a complete spectrum of orthogonal eigenvectors.
What kinds of matrices can we analyze using spectral graph theory?
- The adjacency matrix: this matrix is a good starting point due to its direct relation to graph structure. It also has the important property of being symmetric, which means that it has a complete spectrum of real-valued, orthogonal eigenvectors.
- Laplacian matrix : it’s defined by where is a diagonal matrix such that equals the degree of node and is the adjacency matrix of the graph. The Laplacian takes us farther from the direct structure of a graph, but has some interesting properties which will take us towards deeper structural aspects of our graph. We note that the all-ones vector is an eigenvector of the Laplacian with eigenvalue 0. In addition, since is symmetric, it has a complete spectrum of real-valued, orthogonal eigenvectors. Finally, is positive-semidefinite, which means it has three equivalent properties: its eigenvalues are all non-negative, for some matrix and for every vector . This property allows us to use new linear algebra tools to understand and thus our original graph.
In particular, , the second smallest eigenvalue of , is already fascinating and studying it will let us make big strides in understanding graph clustering. By the theory of Rayleigh quotients, we have that where is the eigenvector corresponding to eigenvalue ; in other words, we minimize the objective in the subspace of vectors orthogonal to the first eigenvector in order to find the second eigenvector (remember that is symmetric and thus has an orthogonal basis of eigenvalues). On a high level, Rayleigh quotients frame the eigenvector search as an optimization problem, letting us bring optimization techniques to bear. Note that the objective value does not depend on the magnitude of , so we can constrain its magnitude to be 1. Note additionally that we know that the first eigenvector of is the all-ones vector with eigenvalue 0, so saying that is orthogonal to this vector is equivalent to saying that .
Using these properties and the definition of , we can write out a more concrete formula for : , subject to the constraint . If we additionally constrain to have unit length, the objective turns into simply .
How does relate to our original objective of finding a best partition of our graph? Let’s express our partition as a vector defined by if and if . Instead of using the conductance here, let’s first try to minimize the cut while taking care of the problem of balancing partition sizes by enforcing that (balance size of partitions), which amounts to constraining . Given this size constraint, let’s minimize the cut of the partition, i.e. find that minimizes . Note that the entries of must be or , which has the consequence that the length of is fixed. This optimization problem looks a lot like the definition of ! Indeed, by our findings above we have that this objective is minimized by of our Laplacian, and the optimal clustering is given by its corresponding eigenvector, known as the Fiedler vector.
Now that we have a link between an eigenvalue of and graph partitioning, let’s push the connection further and see if we can get rid of the hard constraint – maybe there is a link between the more flexible conductance measure and . Let’s rephrase conductance here in the following way: if a graph is partitioned into and where , then the conductance of the cut is defined as . A result called the Cheeger inequality links to : in particular, where is the maximum node degree in the graph. The upper bound on is most useful to us for graph partitioning, since we are trying to minimize the conductance; it says that gives us a good estimate of the conductance – we never overestimate it more than by a factor of 2! The corresponding eigenvector is defined by if and if ; the signs of the entries of give us the partition assignments of each node.
Spectral Partitioning Algorithm
Let’s put all our findings together to state the spectral partitioning algorithm.
- Preprocessing: build the Laplacian matrix of the graph
- Decomposition: map vertices to their corresponding entries in the second eigenvector
- Grouping: sort these entries and split the list in two to arrive at a graph partition
Some practical considerations emerge.
- How do we choose a splitting point in step 3? There’s flexibility here – we can use simple approaches like splitting at zero or the median value, or more expensive approaches like minimizing the normalized cut in one dimension.
- How do we partition a graph into more than two clusters? We could divide the graph into two clusters, then further subdivide those clusters, etc (Hagen et al ‘92)…but that can be inefficient and unstable. Instead, we can cluster using multiple eigenvectors, letting each node be represented by its component in these eigenvectors, then cluster these representations, e.g. through k-means (Shi-Malik ‘00), which is commonly used in recent papers. This method is also more principled in the sense that it approximates the optimal k-way normalized cut, emphasizes cohesive clusters and maps points to a well-separated embedded space. Furthermore, using an eigenvector basis ensures that less information is lost, since we can choose to keep the (more informative) components corresponding to bigger eigenvalues.
- How do we select the number of clusters? We can try to pick the number of clusters to maximize the eigengap, the absolute difference between two consecutive eigenvalues (ordered by descending magnitude).
Motif-Based Spectral Clustering
What if we want to cluster by higher-level patterns than raw edges? We can instead cluster graph motifs into “modules”. We can do everything in an analogous way. Let’s start by proposing analogous definitions for cut, volume and conductance:
- is the number of motifs for which some nodes in the motif are in one side of the cut and the rest of the nodes are in the other cut
- is the number of motif endpoints in for the motif
- We define
How do we find clusters of motifs? Given a motif and graph , we’d like to find a set of nodes that minimizes . This problem is NP-hard, so we will again make use of spectral methods, namely motif spectral clustering:
- Preprocessing: create a matrix defined by equals the number of times edge participates in .
- Decomposition: use standard spectral clustering on .
- Grouping: same as standard spectral clustering
Again, we can prove a motif version of the Cheeger inequality to show that the motif conductance found by our algorithm is bounded above by , where is the optimal conductance.
We can apply this method to cluster the food web (which has motifs dictated by biology) and gene regulatory networks (in which directed, signed triads play an important role).