About the maximum information and maximum likelihood principles
Kybernetika, Tome 34 (1998) no. 4, pp. 485-494 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

Neural networks with radial basis functions are considered, and the Shannon information in their output concerning input. The role of information- preserving input transformations is discussed when the network is specified by the maximum information principle and by the maximum likelihood principle. A transformation is found which simplifies the input structure in the sense that it minimizes the entropy in the class of all information-preserving transformations. Such transformation need not be unique - under some assumptions it may be any minimal sufficient statistics.
Neural networks with radial basis functions are considered, and the Shannon information in their output concerning input. The role of information- preserving input transformations is discussed when the network is specified by the maximum information principle and by the maximum likelihood principle. A transformation is found which simplifies the input structure in the sense that it minimizes the entropy in the class of all information-preserving transformations. Such transformation need not be unique - under some assumptions it may be any minimal sufficient statistics.
Classification : 62B10, 62M45, 68T05, 92B20
Keywords: neural networks; radial basis functions; entropy minimization
@article{KYB_1998_34_4_a21,
     author = {Vajda, Igor and Grim, Ji\v{r}{\'\i}},
     title = {About the maximum information and maximum likelihood principles},
     journal = {Kybernetika},
     pages = {485--494},
     year = {1998},
     volume = {34},
     number = {4},
     zbl = {1274.62644},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_1998_34_4_a21/}
}
TY  - JOUR
AU  - Vajda, Igor
AU  - Grim, Jiří
TI  - About the maximum information and maximum likelihood principles
JO  - Kybernetika
PY  - 1998
SP  - 485
EP  - 494
VL  - 34
IS  - 4
UR  - http://geodesic.mathdoc.fr/item/KYB_1998_34_4_a21/
LA  - en
ID  - KYB_1998_34_4_a21
ER  - 
%0 Journal Article
%A Vajda, Igor
%A Grim, Jiří
%T About the maximum information and maximum likelihood principles
%J Kybernetika
%D 1998
%P 485-494
%V 34
%N 4
%U http://geodesic.mathdoc.fr/item/KYB_1998_34_4_a21/
%G en
%F KYB_1998_34_4_a21
Vajda, Igor; Grim, Jiří. About the maximum information and maximum likelihood principles. Kybernetika, Tome 34 (1998) no. 4, pp. 485-494. http://geodesic.mathdoc.fr/item/KYB_1998_34_4_a21/

[1] Atick J. J., Redlich A. N.: Towards a theory of early visual processing. Neural Computation 2 (1990), 308–320 | DOI

[2] Attneave F.: Some informational aspects of visual perception. Psychological Review 61 (1954), 183–193 | DOI

[3] Becker S., Hinton G. E.: A self–organizing neural network that discovers surfaces in random–dot stereograms. Nature (London) 355 (1992), 161–163 | DOI

[4] Bromhead D. S., Lowe D.: Multivariate functional interpolation and adaptive networks. Complex Systems 2 (1988), 321–355 | MR

[5] Casdagli M.: Nonlinear prediction of chaotic time–series. Physica 35D (1989), 335–356 | MR | Zbl

[6] Cover T. M., Thomas J. B.: Elements of Information Theory. Wiley, New York 1991 | MR | Zbl

[7] Dempster A. P., Laird N. M., Rubin D. B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 (1977), 1–38 | MR | Zbl

[8] Devroye L., Győrfi L.: Nonparametric Density Estimation: The $L_1$ View. John Wiley, New York 1985 | MR

[9] Devroye L., Győrfi L., Lugosi G.: A Probabilistic Theory of Pattern Recognition. Springer, New York 1996 | MR

[11] Haykin S.: Neural Networks: A Comprehensive Foundation. MacMillan, New York 1994 | Zbl

[12] Hertz J., Krogh A., Palmer R. G.: Introduction to the Theory of Neural Computation. Addison–Wesley, New York, Menlo Park CA, Amsterdam 1991 | MR

[13] Jacobs R. A., Jordan M. I.: A competitive modular connectionist architecture. In: Advances in Neural Information Processing Systems (R. P. Lippmann, J. E. Moody and D. J. Touretzky, eds.), Morgan Kaufman, San Mateo CA 1991, Vol. 3. pp. 767–773

[14] Kay J.: Feature discovery under contextual supervision using mutual information. In: International Joint Conference on Neural Networks, Baltimore MD 1992, Vol. 4, pp. 79–84

[15] Liese F., Vajda I.: Convex Statistical Distances. Teubner Verlag, Leipzig 1987 | MR | Zbl

[16] Linsker R.: Self–organization in perceptual network. Computer 21 (1988), 105–117 | DOI

[17] Linsker R.: Perceptual neural organization: Some approaches based on network models and information theory. Annual Review of Neuroscience 13 (1990), 257–281 | DOI

[18] Lowe D.: Adaptive radial basis function nonlinearities, and the problem of generalization. In: First IEE International Conference on Artificial Neural Networks, 1989, pp. 95–99

[19] Moody J., Darken C.: Fast learning in locally–tuned processing units. Neural Computation 1 (1989), 281–294 | DOI

[20] Palm H. CH.: A new method for generating statistical classifiers assuming linear mixtures of Gaussiian densities. In: Proceedings of the 12th IAPR Int. Conference on Pattern Recognition, IEEE Computer Society Press Jerusalem 1994, Vol. II., pp. 483–486

[21] Plumbley M. D.: A Hebbian/anti–Hebbian network which optimizes information capacity by orthonormalizing the principle subspace. In: IEE Artificial Neural Networks Conference, ANN-93, Brighton 1992, pp. 86–90

[22] Plumbley M. D., Fallside F.: An information–theoretic approach to unsupervised connectionist models. In: Proceedings of the 1988 Connectionist Models Summer School, (D. Touretzky, G. Hinton and T. Sejnowski, eds.), Morgan Kaufmann, San Mateo 1988, pp. 239–245

[23] Poggio T., Girosi F.: Regularization algorithms for learning that are eqivalent to multilayer networks. Science 247 (1990), 978–982 | DOI | MR

[24] Rissanen J.: Stochastic Complexity in Statistical Inquiry. World Scientific, New Jersey 1989 | MR | Zbl

[25] Specht D. F.: Probabilistic neural networks for classification, mapping or associative memory. In: Proc. of the IEEE Int. Conference on Neural Networks, 1988, Vol. I., pp. 525–532

[26] Shannon C. E.: A mathematical theory of communication. Bell System Technical Journal 27 (1948), 379–423, 623–656 | DOI | MR | Zbl

[27] Streit L. R., Luginbuhl T. E.: Maximum likelihood training of probabilistic neural networks. IEEE Trans. Neural Networks 5 (1994), 5, 764–783 | DOI

[28] Vajda I., Grim J.: Bayesian optimality of decisions is achievable by RBF neural networks. IEEE Trans. Neural Networks, submitted

[29] Ukrainec A., Haykin S.: A modular neural network for unhancement of errors–polar radar targets. Neural Networks 9 (1996), 141–168 | DOI

[30] Uttley A. M.: The transmission of information and the effect of local feedback in theoretical and neural networks. Brain Research 102 (1966), 23–35

[31] Watanabe S., Fukumizu K.: Probabilistic design of layered neural networks based on their unified framework. IEEE Trans. Neural Networks 6 (1995), 3, 691–702 | DOI

[32] Xu L., Jordan M. I.: EM learning on a generalized finite mixture model for combining multiple classifiers. In: World Congress on Neural Networks, 1993, Vol. 4, pp. 227–230

[33] Xu L., Krzyżak A., Oja E.: Rival penalized competitive learning for clustering analysis, RBF net and curve detection. IEEE Trans. Neural Networks 4 (1993), 636–649 | DOI