Solution of clusterization problem by graph optimization methods
Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, Tome 161 (2019) no. 3, pp. 423-437 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

The rapid growth in the volume of processed information that takes place nowadays determines the urgency of the development of methods for reducing the dimension of computational problems. One of the approaches to reducing the dimensionality of data is their clustering, i.e., uniting into maximally homogeneous groups. At the same time, it is desirable that representatives of different clusters should be as much as possible unlike each other. Along with the dimension reduction, clustering procedures have an independent value. For example, we know the market segmentation problem in economics, the feature typologization problem in sociology, faces diagnostics in geology, etc. Despite the large number of known clusterization methods, the development and study of new ones remain relevant. The reason is that there is no algorithm that would surpass all the rest by all criteria (speed, insensitivity to clusters' size and shape, number of input parameters, etc.). In this paper, we propose a clustering algorithm based on the notions of the graph theory (namely, the maximum flow (the minimum cut) theorem) and compare the results obtained by it and by four other algorithms that belong to various classes of clusterization techniques.
Keywords: clustering, maximal flow, minimal cut, Ford–Fulkerson theorem, labeling method, $k$-means, hierarchical clusterization, Ward's procedure, DBSCAN method, MaxFlow algorithm.
@article{UZKU_2019_161_3_a7,
     author = {I. V. Konnov and O. A. Kashina and E. I. Gilmanova},
     title = {Solution of clusterization problem by graph optimization methods},
     journal = {U\v{c}\"enye zapiski Kazanskogo universiteta. Seri\^a Fiziko-matemati\v{c}eskie nauki},
     pages = {423--437},
     year = {2019},
     volume = {161},
     number = {3},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/UZKU_2019_161_3_a7/}
}
TY  - JOUR
AU  - I. V. Konnov
AU  - O. A. Kashina
AU  - E. I. Gilmanova
TI  - Solution of clusterization problem by graph optimization methods
JO  - Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki
PY  - 2019
SP  - 423
EP  - 437
VL  - 161
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/UZKU_2019_161_3_a7/
LA  - ru
ID  - UZKU_2019_161_3_a7
ER  - 
%0 Journal Article
%A I. V. Konnov
%A O. A. Kashina
%A E. I. Gilmanova
%T Solution of clusterization problem by graph optimization methods
%J Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki
%D 2019
%P 423-437
%V 161
%N 3
%U http://geodesic.mathdoc.fr/item/UZKU_2019_161_3_a7/
%G ru
%F UZKU_2019_161_3_a7
I. V. Konnov; O. A. Kashina; E. I. Gilmanova. Solution of clusterization problem by graph optimization methods. Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, Tome 161 (2019) no. 3, pp. 423-437. http://geodesic.mathdoc.fr/item/UZKU_2019_161_3_a7/

[1] D. Xu, Tian Y., “A comprehensive survey of clustering algorithms”, Ann. Data Sci., 2:2 (2015), 165–193 | DOI

[2] R. Sharan, Shamir R., “CLICK: A clustering algorithm with applications to gene expression analysis”, Proc. Int. Conf. Intell. Syst. Mol. Biol., AAAI Press, 2000, 307–316 | MR

[3] L. R. Ford, D. R. Fulkerson, Flows in Networks, Princeton Univ. Press, 1962, xii+194 pp. | MR | Zbl

[4] L. R. Ford Jr., D. R. Fulkerson, “Maximal flow through a network”, Can. J. Math., 8 (1956), 399–404 | DOI | MR | Zbl

[5] Y. Dinitz, “Dinitz' algorithm: The original version and Even's version”, Theoretical Computer Science, Lecture Notes in Computer Science, 3895, eds. Goldreich O., Rosenberg A.L., Selman A.L., Springer, Berlin–Heidelberg, 2006, 218–240 | DOI | MR

[6] J. Edmonds, R. M. Karp, “Theoretical improvements in algorithmic efficiency for network flow problems”, J. Assoc. Comput. Mach., 19:2 (1972), 248–264 | DOI | MR | Zbl

[7] G. M. Adel'son-Vel'sky, E. A. Dinitz, A. V. Karzanov, Flow Algorithms, Nauka, M., 1975, 119 pp. (In Russian) | MR

[8] E. Sivogolovko, “Methods of evaluating the quality of clear clustering”, Komp'yut. Instrum. Obraz., 2011, no. 4, 14–31 (In Russian)

[9] H. Steinhaus, “Sur la division des corps materiels en parties”, Bull. Acad. Polon. Sci., Cl. III, IV:12 (1956), 801–804 (In French) | MR

[10] S. Lloyd, “Least square quantization in PCM”, Trans. Inf. Theory, IT-28:2 (1982), 129–137 | DOI | MR | Zbl

[11] J. MacQueen, “Some methods for classification and analysis of multivariate observations”, Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability, 1967, 281–297 | MR | Zbl

[12] S. Johnson, “Hierarchical clustering schemes”, Psychometrika, 32:3 (1967), 241–254 | DOI | Zbl

[13] J. H. Ward, “Hierarchical grouping to optimize an objective function”, J. Am. Stat. Assoc., 58:301 (1963), 236–244 | DOI | MR

[14] M. Ester, H. P. Kriegel, J. Sander, X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise”, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, KDD-96, AAAI Press, 1996, 226–231

[15] P. Franti, S. Sieranoja, Clustering basic benchmark, http://cs.joensuu.fi/sipu/datasets/