The probabilistic method of finding the local-optimum of clustering
Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, no. 1 (2016), pp. 28-37 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

The stability of clustering methods is a commonly used approach in cluster analysis for determining the “true” number of groupings. The acceptable clustering is such data sample grouping that is robust to random perturbations of investigated data. In this paper, we propose an algorithm for determining the number of clusters based on the introduction of the initial dataset which are expanded by adding the set of perturbated initial dataset. Refs 30. Figs 2.
Keywords: clustering, cluster stability, optimal cluster number.
@article{VSPUI_2016_1_a2,
     author = {A. Lozkins and V. M. Bure},
     title = {The probabilistic method of finding the local-optimum of clustering},
     journal = {Vestnik Sankt-Peterburgskogo universiteta. Prikladna\^a matematika, informatika, processy upravleni\^a},
     pages = {28--37},
     year = {2016},
     number = {1},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/VSPUI_2016_1_a2/}
}
TY  - JOUR
AU  - A. Lozkins
AU  - V. M. Bure
TI  - The probabilistic method of finding the local-optimum of clustering
JO  - Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
PY  - 2016
SP  - 28
EP  - 37
IS  - 1
UR  - http://geodesic.mathdoc.fr/item/VSPUI_2016_1_a2/
LA  - ru
ID  - VSPUI_2016_1_a2
ER  - 
%0 Journal Article
%A A. Lozkins
%A V. M. Bure
%T The probabilistic method of finding the local-optimum of clustering
%J Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
%D 2016
%P 28-37
%N 1
%U http://geodesic.mathdoc.fr/item/VSPUI_2016_1_a2/
%G ru
%F VSPUI_2016_1_a2
A. Lozkins; V. M. Bure. The probabilistic method of finding the local-optimum of clustering. Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, no. 1 (2016), pp. 28-37. http://geodesic.mathdoc.fr/item/VSPUI_2016_1_a2/

[1] Quackenbush J., “Computational analysis of microarray data”, Nature reviews genetics, 2:6 (2001), 418–427 | DOI

[2] Shamir R., Sharan R., “Algorithmic approaches to clustering gene expression data”, Current Topics in Computational Biology, 2001 (assessed: 15.09.2015) http://citeseerx.ist.psu.edu/

[3] Gordon A. D., Classification, Chapman Hall/CRC Monographs on Statistics Applied Probability, Prentice-Hall, 1999 (assessed: 20.08.2015) http://www.citeulike.org/ | MR | Zbl

[4] Jain A. K., Dubes R. C., Algorithms for clustering data, Prentice-Hall, Inc., 1988 (assessed: 14.06.2015) http://dl.acm.org/ | MR | Zbl

[5] Buhmann J., “Data clustering and learning”, The Handbook of Brain Theory and Neural Networks, 1995, 278–281

[6] Chakravarthy S. V., Ghosh J., “Scale-based clustering using the radial basis function network”, Neural Networks. IEEE Transactions, 7:5 (1996), 1250–1261 | DOI

[7] Gordon A. D., “Identifying genuine clusters in a classification”, Computational Statistics Data Analysis, 18:5 (1994), 561–581 | DOI | MR

[8] Hartigan J. A., “Statistical theory in clustering”, Journal of classification, 2:1 (1985), 63–76 | DOI | MR | Zbl

[9] Milligan G. W., Cooper M. C., “An examination of procedures for determining the number of clusters in a data set”, Psychometrika, 50:2 (1985), 159–179 | DOI | MR

[10] Sugar C. A., James G. M., “Finding the number of clusters in a dataset”, Journal of the American Statistical Association, 98:463 (2003), 750–763 | DOI | MR | Zbl

[11] Tibshirani R., Walther G., “Cluster validation by prediction strength”, Journal of Computational and Graphical Statistics, 14:3 (2005), 511–528 | DOI | MR

[12] Hubert L., Schultz J., “Quadratic assignment as a general data analysis strategy”, British journal of mathematical and statistical psychology, 29:2 (1976), 190–241 | DOI | MR | Zbl

[13] Tibshirani R., Walther G., Hastie T., “Estimating the number of clusters in a data set via the gap statistic”, Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63:2 (2001), 411–423 | DOI | MR | Zbl

[14] Wishart D., “Mode analysis: A generalization of nearest neighbor which reduces chaining effects”, Numerical taxonomy, 76:17 (1969), 282–311

[15] Hartigan J. A., Clustering algorithms, John Wiley Sons, Inc., New York, 1975 (assessed: 28.08.2015) http://dl.acm.org/ | MR | Zbl

[16] Hartigan J. A., “Consistency of single linkage for high-density clusters”, Journal of the American Statistical Association, 76:374 (1981), 388–394 | DOI | MR | Zbl

[17] Cuevas A., Febrero M., Fraiman R., “Estimating the number of clusters”, Canadian Journal of Statistics, 28:2 (2000), 367–382 | DOI | MR | Zbl

[18] Cuevas A., Febrero M., Fraiman R., “Cluster analysis: a further approach based on density estimation”, Computational Statistics Data Analysis, 36:4 (2001), 441–459 | DOI | MR | Zbl

[19] Stuetzle W., “Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample”, Journal of classification, 20:1 (2003), 25–47 | DOI | MR | Zbl

[20] Pelleg D., Moore A. W., “$X$-means: Extending $K$-means with Efficient Estimation of the Number of Clusters”, ICML, 2000, 727–734

[21] Volkovich Z., Brazly Z., Toledano-Kitai D., Avros R., “The Hotelling's metric as a cluster stability measure”, Computer modelling and new technologies, 14 (2010), 65–72

[22] Barzily Z., Volkovich Z., Akteke-Ozturk B., “On a minimal spanning tree approach in the cluster validation problem”, Informatica. Lith. Acad. Sci., 20:2 (2009), 187–202 | MR | Zbl

[23] Hamerly Y. F. G., “PG-means: learning the number of clusters in data”, Advances in neural information processing systems, 19 (2007), 393–400

[24] Breckenridge J. N., “Replicating cluster analysis: Method, consistency, and validity”, Multivariate Behavioral Research, 24:2 (1989), 147–161 | DOI

[25] Dudoit S., Fridlyand J., “A prediction-based resampling method for estimating the number of clusters in a dataset”, Genome biology, 3:7 (2002), research0036 (assessed: 14.06.2015) http://www.genomebiology.com/ | DOI

[26] Lange T., Roth V., Braun M. L., Buhmann J. M., “Stability-based validation of clustering solutions”, Neural computation, 16:6 (2004), 1299–1323 | DOI | Zbl

[27] Milligan G. W., Cheng R., “Measuring the influence of individual data points in a cluster analysis”, Journal of classification, 13:2 (1996), 315–335 | DOI | MR | Zbl

[28] Ben-Hur A., Elisseeff A., Guyon I., “A stability based method for discovering structure in clustered data”, Pacific symposium on biocomputing, 7:6 (2002), 6–17

[29] Levine E., Domany E., “Resampling method for unsupervised estimation of cluster validity”, Neural computation, 13:11 (2001), 2573–2593 | DOI | Zbl

[30] Fowlkes E. B., Mallows C. L., “A method for comparing two hierarchical clusterings”, Journal of the American statistical association, 78:383 (1983), 553–569 | DOI