Mixture of experts architectures for neural networks as a special case of conditional expectation formula
Kybernetika, Tome 34 (1998) no. 4, pp. 417-422
Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

Recently a new interesting architecture of neural networks called “mixture of experts” has been proposed as a tool of real multivariate approximation or prediction. We show that the underlying problem is closely related to approximating the joint probability density of involved variables by finite mixture. Particularly, assuming normal mixtures, we can explicitly write the conditional expectation formula which can be interpreted as a mixture-of- experts network. In this way the related optimization problem can be reduced to standard estimation of normal mixtures by means of EM algorithm. The resulting prediction is optimal in the sense of minimum dispersion if the assumed mixture model is true. It is shown that some of the recently published results can be obtained by specifying the normal components of mixtures in a special form.
Recently a new interesting architecture of neural networks called “mixture of experts” has been proposed as a tool of real multivariate approximation or prediction. We show that the underlying problem is closely related to approximating the joint probability density of involved variables by finite mixture. Particularly, assuming normal mixtures, we can explicitly write the conditional expectation formula which can be interpreted as a mixture-of- experts network. In this way the related optimization problem can be reduced to standard estimation of normal mixtures by means of EM algorithm. The resulting prediction is optimal in the sense of minimum dispersion if the assumed mixture model is true. It is shown that some of the recently published results can be obtained by specifying the normal components of mixtures in a special form.
Classification : 68T05, 92B20
Keywords: neural networks; mixtures; multivariate approximation; prediction
@article{KYB_1998_34_4_a10,
     author = {Grim, Ji\v{r}{\'\i}},
     title = {Mixture of experts architectures for neural networks as a special case of conditional expectation formula},
     journal = {Kybernetika},
     pages = {417--422},
     year = {1998},
     volume = {34},
     number = {4},
     zbl = {1274.68314},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_1998_34_4_a10/}
}
TY  - JOUR
AU  - Grim, Jiří
TI  - Mixture of experts architectures for neural networks as a special case of conditional expectation formula
JO  - Kybernetika
PY  - 1998
SP  - 417
EP  - 422
VL  - 34
IS  - 4
UR  - http://geodesic.mathdoc.fr/item/KYB_1998_34_4_a10/
LA  - en
ID  - KYB_1998_34_4_a10
ER  - 
%0 Journal Article
%A Grim, Jiří
%T Mixture of experts architectures for neural networks as a special case of conditional expectation formula
%J Kybernetika
%D 1998
%P 417-422
%V 34
%N 4
%U http://geodesic.mathdoc.fr/item/KYB_1998_34_4_a10/
%G en
%F KYB_1998_34_4_a10
Grim, Jiří. Mixture of experts architectures for neural networks as a special case of conditional expectation formula. Kybernetika, Tome 34 (1998) no. 4, pp. 417-422. http://geodesic.mathdoc.fr/item/KYB_1998_34_4_a10/

[1] Dempster A. P., Laird N. M., Rubin D. B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. ser. B 39 (1977), 1–38 | MR | Zbl

[2] Grim J.: On numerical evaluation of maximum–likelihood estimates for finite mixtures of distributions. Kybernetika 18 (1982), 3, 173–190 | MR | Zbl

[3] Grim J.: Maximum likelihood design of layered neural networks. In: IEEE Proceedings of the 13th International Conference on Pattern Recognition, IEEE Press 1996, pp. 85–89

[4] Grim J.: Design of multilayer neural networks by information preserving transforms. In: Proc. 3rd Systems Science European Congress (E. Pessa, M. B. Penna and A. Montesanto, eds.), Edizzioni Kappa, Roma 1996, pp. 977–982

[5] Jacobs R. A., Jordan M. I., Nowlan S. J., Hinton G. E.: Adaptive mixtures of local experts. Neural Comp. 3 (1991), 79–87 | DOI

[6] Jordan M. I., Jacobs R. A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comp. 6 (1994), 181–214 | DOI

[7] Chen, Ke, Xie, Dahong, Chi, Huisheng: A modified HME architecture for text–dependent speaker identification. IEEE Trans. Neural Networks 7 (1996), 1309–1313 | DOI

[8] Ramamurti V., Ghosh J.: Structural adaptation in mixtures of experts. In: IEEE Proceedings of the 13th International Conference on Pattern Recognition, IEEE Press, 1996, pp. 704–708

[9] Titterington D. M., Smith A. F. M., Makov U. E.: Statistical Analysis of Finite Mixture Distributions. John Wiley & Sons, Chichester – Singapore – New York 1985 | MR | Zbl

[10] Vajda I.: Theory of Statistical Inference and Information. Kluwer, Boston 1992 | Zbl

[11] Wu C. F. J.: On the convergence properties of the EM algorithm. Ann. Statist. 11 (1983), 95–103 | DOI | MR | Zbl

[12] Xu L., Jordan M. I.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comp. 8 (1996), 129–151 | DOI

[13] Xu L., Jordan M. I., Hinton G. E.: A modified gating network for the mixtures of experts architecture. In: Proc. WCNN’94, San Diego 1994, Vol. 2, pp. 405–410