The Complexity of the Simultaneous Cluster Problem
Journal of Graph Algorithms and Applications, Tome 18 (2014) no. 1, pp. 1-34.

Voir la notice de l'article provenant de la source Journal of Graph Algorythms and Applications website

We study clustering over multiple graphs - each encoding a distinct set of similarity relationships (edges) over the same set of objects (nodes) - where the aim is to identify clusters that are supported across the collection of graphs. This problem of simultaneous clustering is readily motivated by the recent deluge of datasets in several domains (including the biological sciences, social sciences, and marketing), where the same objects are repeatedly measured in different conditions, populations or time points. Whilst there has been a vast amount of heuristic work on practical simultaneous clustering problems, little is known on the theoretical side - we present theoretical results that help explain why such heuristics typically come without quantitative guarantees. We give algorithmic and complexity results for simultaneous clustering using two standard measures on clustering quality: density and connectivity. Specifically, we focus on the basic problem of finding a single cluster (rather than an entire clustering) that is simultaneously of high quality in every graph. When the quality of a cluster is its minimum density over all graphs, we show the problem is not approximable within a factor of 2log1−εn, unless NP ⊆ DTIME(npolylog n). Furthermore, this problem appears very difficult even when there are just two graphs; the resulting problem is approximately as hard as the problem of finding a dense subgraph on at most k vertices. When cluster quality is a fixed connectivity requirement between terminals within the cluster, there are two natural optimization problems: a maximization version (find a good quality cluster with as many terminals as possible) and a minimization version (find a good quality cluster that is as small as possible). We show that the maximization problem is tractable in polynomial time for any fixed connectivity requirement k. On the other hand the minimization problem is hard to approximate within a factor of 2log1−εn, unless NP ⊆ DTIME(npolylog n). The number of graphs in our reduction depends on n. If instead the number of graphs is fixed, we show there is an ε > 0 for which the minimization problem is not approximable within g1/2−ε for any fixed number g of graphs unless NP = ZPP. These hardness results for the minimization problem hold even in the simple cases where the connectivity requirement is one and there are either just two terminal nodes or every node is a terminal node. We remark that our results extend to case where more robust variants of the quality measure are used.
DOI : 10.7155/jgaa.00313
Keywords: graph theory, clustering, multiple graphs, inapproximability, density, connectivity, label cover
@article{JGAA_2014_18_1_a0,
     author = {Zhentao Li and Manikandan Narayanan and Adrian Vetta},
     title = {The {Complexity} of the {Simultaneous} {Cluster} {Problem}},
     journal = {Journal of Graph Algorithms and Applications},
     pages = {1--34},
     publisher = {mathdoc},
     volume = {18},
     number = {1},
     year = {2014},
     doi = {10.7155/jgaa.00313},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.7155/jgaa.00313/}
}
TY  - JOUR
AU  - Zhentao Li
AU  - Manikandan Narayanan
AU  - Adrian Vetta
TI  - The Complexity of the Simultaneous Cluster Problem
JO  - Journal of Graph Algorithms and Applications
PY  - 2014
SP  - 1
EP  - 34
VL  - 18
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/articles/10.7155/jgaa.00313/
DO  - 10.7155/jgaa.00313
LA  - en
ID  - JGAA_2014_18_1_a0
ER  - 
%0 Journal Article
%A Zhentao Li
%A Manikandan Narayanan
%A Adrian Vetta
%T The Complexity of the Simultaneous Cluster Problem
%J Journal of Graph Algorithms and Applications
%D 2014
%P 1-34
%V 18
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/articles/10.7155/jgaa.00313/
%R 10.7155/jgaa.00313
%G en
%F JGAA_2014_18_1_a0
Zhentao Li; Manikandan Narayanan; Adrian Vetta. The Complexity of the Simultaneous Cluster Problem. Journal of Graph Algorithms and Applications, Tome 18 (2014) no. 1, pp. 1-34. doi : 10.7155/jgaa.00313. http://geodesic.mathdoc.fr/articles/10.7155/jgaa.00313/

Cité par Sources :