Practical SAHN Clustering for Very Large Data Sets and Expensive Distance Metrics
Journal of Graph Algorithms and Applications, Special Issue on Selected Papers from the Eighth International Workshop on Algorithms and Computation, WALCOM 2014 , Tome 18 (2014) no. 4, pp. 577-602.

Voir la notice de l'article provenant de la source Journal of Graph Algorythms and Applications website

Sequential agglomerative hierarchical non-overlapping (SAHN) clustering techniques belong to the classical clustering methods applied heavily in many application domains, e.g., in cheminformatics. Asymptotically optimal SAHN clustering algorithms are known for arbitrary dissimilarity measures, but their quadratic time and space complexity even in the best case still limits the applicability to small data sets. We present a new pivot based heuristic SAHN clustering algorithm exploiting the properties of metric distance measures in order to obtain a bestcase runtime of O(nlogn) for the input size n. Our approach requires only linear space and supports median and centroid linkage. It is especially suitable for expensive distance measures, as it needs only a linear number of exact distance computations. This aspect is demonstrated in our extensive experimental evaluation, where we apply our algorithm to large graph databases in combination with computationally demanding graph distance metrics. We compare our approach to exact state-of-the-art SAHN algorithms in terms of quality and runtime on real-world and synthetic instances including vector and graph data. The evaluations show a subquadratic runtime in practice and a very low memory footprint. Our approach yields high-quality clusterings and is able to rediscover planted cluster structures in synthetic data sets.
@article{JGAA_2014_18_4_a5,
     author = {Nils Kriege and Petra Mutzel and Till Sch\"afer},
     title = {Practical {SAHN} {Clustering} for {Very} {Large} {Data} {Sets} and {Expensive} {Distance} {Metrics}},
     journal = {Journal of Graph Algorithms and Applications},
     pages = {577--602},
     publisher = {mathdoc},
     volume = {18},
     number = {4},
     year = {2014},
     doi = {10.7155/jgaa.00338},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.7155/jgaa.00338/}
}
TY  - JOUR
AU  - Nils Kriege
AU  - Petra Mutzel
AU  - Till Schäfer
TI  - Practical SAHN Clustering for Very Large Data Sets and Expensive Distance Metrics
JO  - Journal of Graph Algorithms and Applications
PY  - 2014
SP  - 577
EP  - 602
VL  - 18
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/articles/10.7155/jgaa.00338/
DO  - 10.7155/jgaa.00338
LA  - en
ID  - JGAA_2014_18_4_a5
ER  - 
%0 Journal Article
%A Nils Kriege
%A Petra Mutzel
%A Till Schäfer
%T Practical SAHN Clustering for Very Large Data Sets and Expensive Distance Metrics
%J Journal of Graph Algorithms and Applications
%D 2014
%P 577-602
%V 18
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/articles/10.7155/jgaa.00338/
%R 10.7155/jgaa.00338
%G en
%F JGAA_2014_18_4_a5
Nils Kriege; Petra Mutzel; Till Schäfer. Practical SAHN Clustering for Very Large Data Sets and Expensive Distance Metrics. Journal of Graph Algorithms and Applications, 
							Special Issue on Selected Papers from the Eighth International Workshop on Algorithms and Computation, WALCOM 2014
					, Tome 18 (2014) no. 4, pp. 577-602. doi : 10.7155/jgaa.00338. http://geodesic.mathdoc.fr/articles/10.7155/jgaa.00338/

Cité par Sources :