Graph clustering based on modularity variation estimations
The Bulletin of Irkutsk State University. Series Mathematics, Tome 25 (2018), pp. 63-78
Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

Graph clustering is one of the constantly actual data analysis problems. There are various statements of this problem. In this paper we consider the statement of search for a partition of a graph vertices set into disjoint subsets. In such a way, that the density of connections between the vertices of one subset was higher than that between the vertices of different subsets. There is a lot of various approaches, and many of them use such as an a posteriori estimate of clustering quality, as modularity. The modularity functional, taking the value from $[-1, 1]$, allows us to formally evaluate the quality of partitioning into subsets. This paper deals with an approach, instead of calculating the modularity, using a less computationally complex estimate of modularity change in the operation of clusters union. Four theorems for different graph types are formulated, presenting the possibility of application of suggested estimate, instead of direct modularity calculations. A greedy algorithmic scheme and also AMVE (Algorithm based on Modularity Variation Estimation) — simple greedy algorithm based on the scheme are proposed. An attempt of comparative analysis on the test problemet of AMVE with heuristic clustering algorithms implemented in actual data analysis software is described. It is shown the comparative advantage of AMVE, both in terms of speed and quality of clustering. Also, the cases on the use of developed algorithm and its implementation for data analysis in sociology and history and criticism of literature are mentioned. In these cases, investigated graphs based on social networks data (the ratio of "friendship" in the social network between users used as the graph edges). It is demonstrated a slight superiority of AMVE in clustering quality compared to the known algorithms Louvain and Walktrap.
Keywords: graph clustering, modularity, community detection, social network analysis.
@article{IIGUM_2018_25_a4,
     author = {N. N. Martynov and O. V. Khandarova and F. V. Khandarov},
     title = {Graph clustering based on modularity variation estimations},
     journal = {The Bulletin of Irkutsk State University. Series Mathematics},
     pages = {63--78},
     year = {2018},
     volume = {25},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/IIGUM_2018_25_a4/}
}
TY  - JOUR
AU  - N. N. Martynov
AU  - O. V. Khandarova
AU  - F. V. Khandarov
TI  - Graph clustering based on modularity variation estimations
JO  - The Bulletin of Irkutsk State University. Series Mathematics
PY  - 2018
SP  - 63
EP  - 78
VL  - 25
UR  - http://geodesic.mathdoc.fr/item/IIGUM_2018_25_a4/
LA  - ru
ID  - IIGUM_2018_25_a4
ER  - 
%0 Journal Article
%A N. N. Martynov
%A O. V. Khandarova
%A F. V. Khandarov
%T Graph clustering based on modularity variation estimations
%J The Bulletin of Irkutsk State University. Series Mathematics
%D 2018
%P 63-78
%V 25
%U http://geodesic.mathdoc.fr/item/IIGUM_2018_25_a4/
%G ru
%F IIGUM_2018_25_a4
N. N. Martynov; O. V. Khandarova; F. V. Khandarov. Graph clustering based on modularity variation estimations. The Bulletin of Irkutsk State University. Series Mathematics, Tome 25 (2018), pp. 63-78. http://geodesic.mathdoc.fr/item/IIGUM_2018_25_a4/

[1] Badmatsyrenov T. B., Skvortsov M. V., Khandarov F. V., “Buddhist digital practices of transcendence: VK-community “Hambo Lama Dashi-Dorzho Itigilov””, Monitoring of Public Opinion: Economic and Social Changes, 2018, no. 2, 309–332 (in Russian) | DOI

[2] Blondel V. D. et al., “Fast unfolding of communities in large networks”, Journal of statistical mechanics: theory and experiment, 2008:10 (2008), 10008 | DOI

[3] V. D. Blondel et al., “Fast unfolding of communities in large networks”, Journal of statistical mechanics: theory and experiment, 2008:10 (2008), 10008 | DOI

[4] Csardi G., Nepusz T., “The igraph software package for complex network research”, InterJournal, Complex Systems, 1695:5 (2006), 1–9

[5] Flake G. W., Lawrence S., Giles C. L., “Efficient identification of web communities”, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2000, 150–160 | DOI

[6] Fortunato S., “Community detection in graphs”, Physics reports, 486:3–5 (2010), 75–174 | DOI | MR

[7] Le Martelot E., Hankin C., “Fast multi-scale detection of relevant communities in large-scale networks”, The Computer Journal, 56:9 (2013), 1136–1150 | DOI

[8] J. Leskovec, K. J. Lang, A. Dasgupta, M. W. Mahoney, “Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters”, Internet Mathematics, 6:1 (2009), 29–123 | DOI | MR | Zbl

[9] Leskovec J., Lang K. J., Mahoney M., “Empirical comparison of algorithms for network community detection”, Proceedings of the 19th international conference on World wide web, ACM, 2010, 631–640 | DOI

[10] Maulik U., Bandyopadhyay S., “Genetic algorithm-based clustering technique”, Pattern recognition, 33:9 (2000), 1455–1465 | DOI

[11] Newman M. E. J., “Detecting community structure in networks”, The European Physical Journal B, 38:2 (2004), 321–330 | DOI | MR

[12] Pizzuti C., “A multiobjective genetic algorithm to find communities in complex networks”, IEEE Transactions on Evolutionary Computation, 16:3 (2012), 418–430 | DOI

[13] Pons P., Latapy M., “Computing communities in large networks using random walks”, Journal of Graph Algorithms and Applications, 10:2 (2006), 191–218 | DOI | MR | Zbl

[14] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, D. Parisi, “Defining and identifying communities in networks”, Proceedings of the National Academy of Sciences, 101:9 (2004), 2658–2663 | DOI

[15] Raghavan U. N., Albert R., Kumara S., “Near linear time algorithm to detect community structures in large-scale networks”, Physical review E, 76:3 (2007), 036106 | DOI

[16] Reichardt J., Bornholdt S., “Statistical mechanics of community detection”, Physical Review E, 74:1 (2006), 016110 | DOI | MR