A probabilistic approach to comparing the distances between partitions of a set
Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 14 (2018) no. 1, pp. 14-19 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

This article describes and compares a number of classical metrics to compare different approaches to partition a given set, such as the Rand index, the Larsen and Aone coefficient, among others. We developed a probabilistic framework to compare these metrics and unified representation of distances that uses a common set of parameters. This is done by taking all possible values of similarity measurements between different possible partitions and graduating them by using quantiles of a distribution function. Let ${\lambda }_{\alpha }$ be a quantile with $\alpha $ level for distribution function $F_{\rho }\left(t\right)=P\left(\rho . Then if the proximity measurement $\rho $ is not less than ${\lambda }_{\alpha }$, we can conclude that $\alpha \cdot 100\%$ of randomly chosen pairs of partitions have a proximity measurement less than $\rho $. This means that these partitions can neither be considered close nor similar. This paper identifies the general case of distribution functions that describe similarity measurements, with a special focus on uniform distributions. The comparison results are presented in tables for quantiles of probability distributions, using computer simulations over our selected set of similarity metrics. Refs 9. Table 1.
Keywords: distance between partitions of a set, probabilistic approach, comparing the distances.
@article{VSPUI_2018_14_1_a1,
     author = {A. A. Rogov and A. G. Varfolomeyev and A. O. Timonin and K. A. Proen\c{c}a},
     title = {A probabilistic approach to comparing the distances between partitions of a set},
     journal = {Vestnik Sankt-Peterburgskogo universiteta. Prikladna\^a matematika, informatika, processy upravleni\^a},
     pages = {14--19},
     year = {2018},
     volume = {14},
     number = {1},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/VSPUI_2018_14_1_a1/}
}
TY  - JOUR
AU  - A. A. Rogov
AU  - A. G. Varfolomeyev
AU  - A. O. Timonin
AU  - K. A. Proença
TI  - A probabilistic approach to comparing the distances between partitions of a set
JO  - Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
PY  - 2018
SP  - 14
EP  - 19
VL  - 14
IS  - 1
UR  - http://geodesic.mathdoc.fr/item/VSPUI_2018_14_1_a1/
LA  - en
ID  - VSPUI_2018_14_1_a1
ER  - 
%0 Journal Article
%A A. A. Rogov
%A A. G. Varfolomeyev
%A A. O. Timonin
%A K. A. Proença
%T A probabilistic approach to comparing the distances between partitions of a set
%J Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
%D 2018
%P 14-19
%V 14
%N 1
%U http://geodesic.mathdoc.fr/item/VSPUI_2018_14_1_a1/
%G en
%F VSPUI_2018_14_1_a1
A. A. Rogov; A. G. Varfolomeyev; A. O. Timonin; K. A. Proença. A probabilistic approach to comparing the distances between partitions of a set. Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 14 (2018) no. 1, pp. 14-19. http://geodesic.mathdoc.fr/item/VSPUI_2018_14_1_a1/

[1] Meilă M., “Comparing clusterings by the variation of information”, Learning Theory and Kernel Machines, Lecture Notes in Computer Science, 2777, Springer, 2003, 173–187 | DOI | Zbl

[2] Rand W. M., “Objective criteria for the evaluation of clustering methods”, Journal of the American Statistical Association, 66 (1971), 846–850 | DOI

[3] Fowlkes E. B., Mallows C. L., “A Method for comparing two Hierarchical Clusterings”, Journal of the American Statistical Association, 78 (1983), 553–569 | DOI

[4] Meilă M., Heckerman D., “An experimental comparison of model-based clustering methods”, Machine Learning, 42 (2001), 9–29 | DOI | Zbl

[5] Larsen B., Aone C., “Fast and effective text mining using linear time Document Clustering”, Proceedings of the Conference on Knowledge Discovery and Data Mining, 1999, 16–22

[6] Steel M. A., Penny D., “Distributions of tree comparison metrics — Some new results”, Systematic Biology, 42 (1993), 126–141

[7] Sidorov Y. V., Kirikov P. V., Rogov A. A., “Dendrograms comparison with an equal vertices number”, Scientific notes of Petrozavodsk State University. Series Natural and Technical Sciences, 8, Petrozavodsk State University Publ., Petrozavodsk, 2011, 108–110 (In Russian)

[8] Warrens M. J., “On Robinsonian dissimilarities, the consecutive ones property and latent variable models”, Advances in Data Analysis and Classification, 3 (2009), 169–184 | DOI | MR | Zbl

[9] Varfolomeyev A. A., Kirikov P. V., Rogov A. A., “Probabilistic approach to distances comparison between subsets of a finite set”, Scientific notes of Petrozavodsk State University, 8, Petrozavodsk State University Publ., Petrozavodsk, 2010, 83–88 (In Russian)