Abstract: |
Graph clustering is the task of identifying groups of tightly connected nodes in a graph. This is a widely studied unsupervised learning problem having a wide range of applications in computer science, statistics, biology, economy or social sciences.
One of the most popular methods of graph clustering is spectral clustering (von Luxburg, 2007). If the graph has to be splitted into two communities, the simplest version of such an algorithm uses for partitioning the eigenvector associated with the second smallest eigenvalue of the graph's Laplacian matrix (the so-called Fiedler vector (Fiedler, 1975)). However, although this method is rather efficient both theoretically and practically for many graphs (for instance, in the Stochastic Block Model (Abbe, 2017)), it can be heavily handicapped by geometric structure in a network. Such an example is the Geometric Block Model (GBM) recently introduced in (Galhotra et al., 2018).
We present an effective generalization of standard spectral clustering, which we call higher-order spectral clustering. It is close to the classical spectral clustering method but uses for partitioning the eigenvector of the adjacency matrix associated with an eigenvalue of higher order. This generalization provides weak consistency for a new model of geometric graphs, called Soft Geometric Block Model (SGBM). This result is established in three steps. Firstly, we charaterize the limiting spectrum of the adjacency matrix extending the result of (Bordenave, 2009). Secondly, we take an eigenvalue closest to some 'optimal' value and show that this eigenvalue is separated in the limiting spectrum under certain conditions. Finally, using Kahan-Parlett-Jiang theorem, we prove that the eigenvector corresponding to the considered eigenvalue cannot be far from the vector recovering the true community structure by the signs of its coordinates.
A small adjustment of the algorithm provides strong consistency. This adjustment is called local improvement step, which consists of counting neighbours of each node in the supposed communities.
We also show that higher-order spectral clustering is effective in numerical experiments even for graphs of modest size. In particular, in the case of dense GBM graphs, it outperforms numerically the algorithm of (Galhotra et al., 2018).
For more details and proofs, we refer to our preprint (Avrachenkov et al., 2020).
Ulrike von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.
Miroslav Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czechoslovak Mathematical Journal, 25(4):619–633, 1975.
Emmanuel Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.
Sainyam Galhotra, Arya Mazumdar, Soumyabrata Pal, and Barna Saha. The geometric block model. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
Charles Bordenave. Eigenvalues of Euclidean random matrices. Random Structures & Algorithms, 33(4):515–532, 2008.
Konstantin Avrachenkov, Andrei Bobu and Maximilien Dreveton. https://arxiv.org/pdf/2009.11353.pdf |