Ptas for problems of vector choice and clustering with different centers
Diskretnyj analiz i issledovanie operacij, Tome 30 (2023) no. 3, pp. 96-110.

Voir la notice de l'article provenant de la source Math-Net.Ru

Two close in statements problems are considered. The first one is clustering, i. e. partitioning the set of $d$-dimensional Euclidean vectors into the given number of clusters with different types of centers so that the total dispersion would be minimum. By dispersion here we mean the sum of squared distances between the elements of the clusters and their centers. There are three types of centers: an arbitrary point (clearly, the centroid is the best choice), a point of the initial set (so-called medoid) or a fixed point of the space given in advance. The sizes of the clusters are also given as a part of the input. The second problem is the vector subset choice problem, which is finding a subset of vectors of fixed cardinality having the minimum sum of squared distances between its elements and the centroid. For each of these problems a PTAS is constructed. Bibliogr. 23.
Keywords: clustering, approximation, PTAS, MSSC.
Mots-clés : centroid, medoid
@article{DA_2023_30_3_a4,
     author = {A. V. Pyatkin},
     title = {Ptas for problems of vector choice and clustering with different centers},
     journal = {Diskretnyj analiz i issledovanie operacij},
     pages = {96--110},
     publisher = {mathdoc},
     volume = {30},
     number = {3},
     year = {2023},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/DA_2023_30_3_a4/}
}
TY  - JOUR
AU  - A. V. Pyatkin
TI  - Ptas for problems of vector choice and clustering with different centers
JO  - Diskretnyj analiz i issledovanie operacij
PY  - 2023
SP  - 96
EP  - 110
VL  - 30
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/DA_2023_30_3_a4/
LA  - ru
ID  - DA_2023_30_3_a4
ER  - 
%0 Journal Article
%A A. V. Pyatkin
%T Ptas for problems of vector choice and clustering with different centers
%J Diskretnyj analiz i issledovanie operacij
%D 2023
%P 96-110
%V 30
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/DA_2023_30_3_a4/
%G ru
%F DA_2023_30_3_a4
A. V. Pyatkin. Ptas for problems of vector choice and clustering with different centers. Diskretnyj analiz i issledovanie operacij, Tome 30 (2023) no. 3, pp. 96-110. http://geodesic.mathdoc.fr/item/DA_2023_30_3_a4/

[1] Berkhin P., “A survey of clustering data mining techniques”, Grouping multidimensional data: Recent advances in clustering, Springer, Heidelberg, 2006, 25–71 | DOI

[2] Jain A. K., Dubes R. C., Algorithms for clustering data, Prentice Hall, Englewood Cliffs, NJ, 1988 | MR | Zbl

[3] Ghoreyshi S., Hosseinkhani J., “Developing a clustering model based on \text{$K$-means} algorithm in order to creating different policies for policyholders”, Int. J. Adv. Comput. Sci. Inf. Tech., 4:2 (2015), 46–53

[4] Fisher W. D., “On grouping for maximum homogeneity”, J. Am. Stat. Assoc., 53:284 (1958), 789–798 | DOI | MR | Zbl

[5] MacQueen J., “Some methods for classification and analysis of multivariate observations”, Proc. 5th Berkeley Symp. Mathematics, Statistics and Probability (Berkeley, USA, June 21 — July 18, 1965; Dec. 27, 1965 — Jan. 7, 1966), v. 1, Univ. California Press, Berkeley, 1967, 281–297 | MR | Zbl

[6] Kaufman L., Rousseeuw P. J., “Clustering by means of medoids”, Statistical data analysis based on the $L_1$-norm, North-Holland, Amsterdam, 1987, 405–416 | MR

[7] Aloise D., Deshpande A., Hansen P., Popat P., “NP-hardness of Euclidean sum-of-squares clustering”, Mach. Learn., 75:2 (2009), 245–248 | DOI | Zbl

[8] Inaba M., Katoh N., Imai H., “Applications of weighted Voronoi diagrams and randomization to variance-based $k$-clustering”, Proc. 10th Annu. Symp. Computational Geometry (Stony Brook, NY, USA, June 6–8, 1994), ACM Press, New York, 1994, 332–339

[9] Kumar A., Sabharwal Y., Sen S., “A simple linear time $(1+\varepsilon)$-approximation algorithm for geometric $k$-means clustering in any dimensions”, Proc. 45th Annu. IEEE Symp. Foundations of Computer Science (Rome, Italy, Oct. 17–19, 2004), IEEE Comp. Soc., Los Alamitos, CA, 2004, 454–462 | DOI

[10] A. E. Baburin, É. Kh. Gimadi, N. I. Glebov, and A. V. Pyatkin, “The problem of finding a subset of vectors with the maximum total weight”, J. Appl. Ind. Math., 2:1 (2008), 32–38 | DOI | MR | Zbl

[11] É. Kh. Gimadi, A. V. Kel'manov, M. A. Kel'manova, and S. A. Khamidullin, “A posteriori detection of a quasiperiodic fragment with a given number of repetitions in a numerical sequence”, Sib. Zh. Ind. Mat., 9:1 (2006), 55–74 | MR | Zbl

[12] A. V. Kel'manov and A. V. Pyatkin, “On the complexity of a search for a subset of «similar» vectors”, Dokl. Math., 78:1 (2008), 574–575 | DOI | MR | Zbl

[13] A. V. Kel'manov and A. V. Pyatkin, “On a version of the problem of choosing a vector subset”, J. Appl. Ind. Math., 3:4 (2009), 447–455 | DOI | MR | Zbl

[14] A. V. Dolgushev and A. V. Kel'manov, “An approximation algorithm for solving a problem of cluster analysis”, J. Appl. Ind. Math., 5:4 (2011), 551–558 | DOI | MR | MR | Zbl

[15] A. V. Kel'manov and V. I. Khandeev, “A $2$-approximation polynomial algorithm for a clustering problem”, J. Appl. Ind. Math., 7:4 (2013), 515–521 | DOI | MR | Zbl

[16] A. V. Dolgushev, A. V. Kel'manov, and V. V. Shenmaier, “A polynomial-time approximation scheme for a problem of partitioning a finite set into two clusters”, Proc. Steklov Inst. Math., 295, Suppl. 1 (2016), 47–56 | DOI | MR

[17] A. V. Kel'manov, A. V. Pyatkin, and V. I. Khandeev, “NP-hardness of quadratic Euclidean $1$-mean and $1$-median $2$-clustering problem with constraints on the cluster sizes”, Dokl. Math., 100:3 (2019), 545–548 | DOI | DOI | MR | Zbl

[18] Pyatkin A. V., “1-Mean and 1-medoid 2-clustering problem with arbitrary cluster sizes: Complexity and approximation”, Yugoslav J. Oper. Res., 33:1 (2023), 59–69 | DOI | MR

[19] V. V. Shenmaier, “An approximation scheme for a problem of search for a vector subset”, J. Appl. Ind. Math., 6:3 (2012), 381–386 | DOI | MR | Zbl

[20] A. E. Galashov and A. V. Kel'manov, “A $2$-approximate algorithm to solve one problem of a family of disjoint vector subsets”, Autom. Remote Control, 75:4 (2014), 595–606 | DOI | MR | Zbl

[21] Edmonds J., Karp R. M., “Theoretical improvements in algorithmic efficiency for network flow problems”, J. ACM, 19:2 (1972), 248–264 | DOI | MR | Zbl

[22] Gabow H. N., Tarjan R. E., “Faster scaling algorithms for network problems”, SIAM J. Comput., 18:5 (1989), 1013–1036 | DOI | MR | Zbl

[23] Wirth H., Algorithms + data structures = programs, Prentice Hall, Englewood Cliffs, NJ, 1976 | MR