Automatic determination of the numbers of components in the EM algorithm for the restoration of a mixture of normal distributions
Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki, Tome 50 (2010) no. 4, pp. 770-783 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

The classical EM algorithm for the restoration of the mixture of normal probability distributions cannot determine the number of components in the mixture. An algorithm called ARD EM for the automatic determination of the number of components is proposed, which is based on the relevance vector machine. The idea behind this algorithm is to use a redundant number of mixture components at the first stage and then determine the relevant components by maximizing the evidence. Experiments with model problems show that the number of clusters thus determined either coincides with the actual number or slightly exceeds it. In addition, clusterization using ARD EM turns out to be closer to the actual clusterization than that obtained by the analogs based on cross validation and the minimum description length principle.
@article{ZVMMF_2010_50_4_a13,
     author = {D. P. Vetrov and D. A. Kropotov and A. A. Osokin},
     title = {Automatic determination of the numbers of components in the {EM} algorithm for the restoration of a mixture of normal distributions},
     journal = {\v{Z}urnal vy\v{c}islitelʹnoj matematiki i matemati\v{c}eskoj fiziki},
     pages = {770--783},
     year = {2010},
     volume = {50},
     number = {4},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/ZVMMF_2010_50_4_a13/}
}
TY  - JOUR
AU  - D. P. Vetrov
AU  - D. A. Kropotov
AU  - A. A. Osokin
TI  - Automatic determination of the numbers of components in the EM algorithm for the restoration of a mixture of normal distributions
JO  - Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki
PY  - 2010
SP  - 770
EP  - 783
VL  - 50
IS  - 4
UR  - http://geodesic.mathdoc.fr/item/ZVMMF_2010_50_4_a13/
LA  - ru
ID  - ZVMMF_2010_50_4_a13
ER  - 
%0 Journal Article
%A D. P. Vetrov
%A D. A. Kropotov
%A A. A. Osokin
%T Automatic determination of the numbers of components in the EM algorithm for the restoration of a mixture of normal distributions
%J Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki
%D 2010
%P 770-783
%V 50
%N 4
%U http://geodesic.mathdoc.fr/item/ZVMMF_2010_50_4_a13/
%G ru
%F ZVMMF_2010_50_4_a13
D. P. Vetrov; D. A. Kropotov; A. A. Osokin. Automatic determination of the numbers of components in the EM algorithm for the restoration of a mixture of normal distributions. Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki, Tome 50 (2010) no. 4, pp. 770-783. http://geodesic.mathdoc.fr/item/ZVMMF_2010_50_4_a13/

[1] Dempster A. P., Laird N. M., Rubin D. B., “Maximum likelihood from incomplete data via the EM algorithm”, J. Roy. Stat. Soc. B, 39 (1977), 1–38 | MR | Zbl

[2] Bishop C. M., Pattern recognition and machine learning, Springer, New York, 2006 | MR

[3] Tipping M. E., “Sparse Bayesian learning and the relevance vector machine”, J. Mach. Learn. Res., 1 (2001), 211–244 | DOI | MR | Zbl

[4] MacKay D. J., “Bayesian interpolation”, Neural Comp., 4:3 (1992), 415–447 | DOI

[5] Kyrgyzov I. O., Kyrgyzov O. O., Maitre H., Campedel M., “Kernel MDL to determine the number of clusters”, Proc. Intern. Conf. Mach. Learn., Data Mining, Leipzig, 2007

[6] Xu L., Jordan M. I., “On convergence properties of the EM algorithm for Gaussian mixtures”, Neural Comp., 8 (1996), 129–151 | DOI

[7] Rissanen J., “Modeling by shortest data description”, Automatica, 14 (1978), 465–471 | DOI | Zbl

[8] Freund Y., Schapire R. E., “A decision-theoretic generalization of on-line learning and an application to boosting”, J. Comp. Syst. Sci., 55:1 (1997), 119–139 | DOI | MR | Zbl

[9] Akaike H. A., “A new look at the statistical model identification”, IEEE Trans. Autom. Contr., 19:6 (1974), 716–723 | DOI | MR | Zbl

[10] Hubert L., Arabie P., “Comparing partitions”, J. Clas., 2 (1985), 193–218 | DOI

[11] Vlassis N., Likas A., “A greedy EM algorithm for Gaussian mixture learning”, Neural Proc. Letters, 2000, 77–87

[12] Verbeek J. J., Vlassis N., Krose B., “Efficient greedy learning of Gaussian mixture models”, Neural Comp., 2003

[13] Kuncheva L. I., Vetrov D. P., “Evaluation of stability of $k$-means cluster ensembles with respect to random initialization”, IEEE Trans. Pattern Anal. Mach. Intell., 28:11 (2005), 1798–1808 | DOI

[14] Ryazanov V. V., “O sinteze klassifitsiruyuschikh algoritmov na konechnykh mnozhestvakh algoritmov klassifikatsii (taksonomii)”, Zh. vychisl. matem i matem. fiz., 22:2 (1982), 429–440 | MR | Zbl