Factorized mutual information maximization

Merkh, Thomas; Montúfar, Guido

doi:10.14736/kyb-2020-5-0948

Merkh, Thomas ; Montúfar, Guido

Kybernetika, Tome 56 (2020) no. 5, pp. 948-978

Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

Abstract (VO)
Abstract (VO)

We investigate the sets of joint probability distributions that maximize the average multi-information over a collection of margins. These functionals serve as proxies for maximizing the multi-information of a set of variables or the mutual information of two subsets of variables, at a lower computation and estimation complexity. We describe the maximizers and their relations to the maximizers of the multi-information and the mutual information.

MR

DOI : 10.14736/kyb-2020-5-0948

Classification : 62B10, 94A17
Keywords: multi-information; mutual information; divergence maximization; marginal specification problem; transportation polytope

@article{10_14736_kyb_2020_5_0948,
     author = {Merkh, Thomas and Mont\'ufar, Guido},
     title = {Factorized mutual information maximization},
     journal = {Kybernetika},
     pages = {948--978},
     year = {2020},
     volume = {56},
     number = {5},
     doi = {10.14736/kyb-2020-5-0948},
     mrnumber = {4187782},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.14736/kyb-2020-5-0948/}
}

TY  - JOUR
AU  - Merkh, Thomas
AU  - Montúfar, Guido
TI  - Factorized mutual information maximization
JO  - Kybernetika
PY  - 2020
SP  - 948
EP  - 978
VL  - 56
IS  - 5
UR  - http://geodesic.mathdoc.fr/articles/10.14736/kyb-2020-5-0948/
DO  - 10.14736/kyb-2020-5-0948
LA  - en
ID  - 10_14736_kyb_2020_5_0948
ER  -

%0 Journal Article
%A Merkh, Thomas
%A Montúfar, Guido
%T Factorized mutual information maximization
%J Kybernetika
%D 2020
%P 948-978
%V 56
%N 5
%U http://geodesic.mathdoc.fr/articles/10.14736/kyb-2020-5-0948/
%R 10.14736/kyb-2020-5-0948
%G en
%F 10_14736_kyb_2020_5_0948

Merkh, Thomas; Montúfar, Guido. Factorized mutual information maximization. Kybernetika, Tome 56 (2020) no. 5, pp. 948-978. doi: 10.14736/kyb-2020-5-0948

Bibliographie
Cité par

[1] Alemi, A., Fischer, I., Dillon, J., Murphy, K.: Deep variational information bottleneck. In: ICLR, 2017.

[2] Ay, N.: An information-geometric approach to a theory of pragmatic structuring. Ann. Probab. 30 (2002), 1, 416-436. | DOI | MR | Zbl

[3] Ay, N.: Locality of global stochastic interaction in directed acyclic networks. Neural Comput. 14 (2002), 12, 2959-2980. | DOI | Zbl

[4] Ay, N., Bertschinger, N., Der, R., Güttler, F., Olbrich, E.: Predictive information and explorative behavior of autonomous robots. Europ. Phys. J. B 63 (2008), 3, 329-339. | DOI | MR

[5] Ay, N., Knauf, A.: Maximizing multi-information. Kybernetika 42 (2006), 5, 517-538. | MR | Zbl

[6] Baldassarre, G., Mirolli, M.: Intrinsically motivated learning systems: an overview. In: Intrinsically motivated learning in natural and artificial systems, Springer 2013, pp. 1-14. | DOI

[7] Baudot, P., Tapia, M., Bennequin, D., Goaillard, J.-M.: Topological information data analysis. Entropy 21 (2019), 9, 869. | DOI | MR

[8] Bekkerman, R., Sahami, M., Learned-Miller, E.: Combinatorial markov random fields. In: European Conference on Machine Learning, Springer 2006, pp. 30-41. | DOI | MR

[9] Belghazi, M. I., Baratin, A., Rajeshwar, S., Ozair, S., Bengio, Y., Courville, A., Hjelm, D.: Mutual information neural estimation. In: Proc. 35th International Conference on Machine Learning (J. Dy and A. Krause, eds.), Vol. 80 of Proceedings of Machine Learning Research, pp. 531-540, Stockholm 2018. PMLR.

[10] Bertschinger, N., Rauh, J., Olbrich, E., Jost, J., Ay, N.: Quantifying unique information. Entropy 16 (2014), 4, 2161-2183. | DOI | MR

[11] Bialek, W., Nemenman, I., Tishby, N.: Predictability, complexity, and learning. Neural Comput. 13 (2001), 11, 2409-2463. | DOI

[12] Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., Efros, A. A.: Large-scale study of curiosity-driven learning. In: ICLR, 2019.

[13] Buzzi, J., Zambotti, L.: Approximate maximizers of intricacy functionals. Probab. Theory Related Fields 153 (2012), 3-4, 421-440. | DOI | MR

[14] Chentanez, N., Barto, A. G., Singh, S. P.: Intrinsically motivated reinforcement learning. In: Adv. Neural Inform. Process. Systems 2005, pp. 1281-1288. | DOI

[15] Crutchfield, J. P., Feldman, D. P.: Synchronizing to the environment: Information-theoretic constraints on agent learning. Adv. Complex Systems 4 (2001), 02n03, 251-264. | DOI | MR

[16] Loera, J. de: Transportation polytopes. DOI

[17] Friedman, N., Mosenzon, O., Slonim, N., Tishby, N.: Multivariate information bottleneck. In: Proc. Seventeenth conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., 2001, pp. 152-161.

[18] Gabrié, M., Manoel, A., Luneau, C., Barbier, j., Macris, N., Krzakala, F., Zdeborová, L.: Entropy and mutual information in models of deep neural networks. In: Advances in Neural Information Processing Systems 31 (S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds.), Curran Associates, Inc. 2018, pp. 1821-1831. | MR

[19] Gao, S., Steeg, G. Ver, Galstyan, A.: Efficient estimation of mutual information for strongly dependent variables. In: Artificial Intelligence and Statistics 2015, pp. 277-286.

[20] Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Y. Bengio.: Learning deep representations by mutual information. Representations, maximization. In International Conference on Learning. 2019.

[21] Hosten, S., Sullivant, S.: Gröbner bases and polyhedral geometry of reducible and cyclic models. J. Comb. Theory Ser. A 100 (2002), 2, 277-301. | DOI | MR

[22] Jakulin, A., Bratko, I.: Quantifying and visualizing attribute interactions: An approach based on entropy. 2003.

[23] Klyubin, A. S., Polani, D., Nehaniv, C. L.: Empowerment: A universal agent-centric measure of control. In: 2005 IEEE Congress on Evolutionary Computation, Vol. 1, IEEE 2005, pp. 128-135.

[24] Kraskov, A., Stögbauer, H./, Grassberger, P.: Estimating mutual information. Phys. Rev. E 69 (2004), 6, 066138. | DOI | MR

[25] Matúš, F.: Maximization of information divergences from binary i.i.d. sequences. In: Proc. IPMU 2004 2 (2004), pp. 1303-1306.

[26] Matúš, F.: Divergence from factorizable distributions and matroid representations by partitions. IEEE Trans. Inf. Theor. 55 (2009), 12, 5375-5381. | DOI | MR

[27] Matúš, F., Ay, N.: On maximization of the information divergence from an exponential family. In: Proc. 6th Workshop on Uncertainty Processing: Oeconomica 2003, Hejnice 2003, pp. 199-204.

[28] Matúš, F., Rauh, J.: Maximization of the information divergence from an exponential family and criticality. In: 2011 IEEE International Symposium on Information Theory Proceedings 2011, pp. 903-907. | DOI | MR

[29] McGill, W.: Multivariate information transmission. Trans. IRE Profess. Group Inform. Theory 4 (1054), 4, 93-111. | DOI | MR

[30] Mohamed, S., Rezende, D. J.: Variational information maximisation for intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems 2015, 2125-2133, 2015.

[31] Montúfar, G.: Universal approximation depth and errors of narrow belief networks with discrete units. Neural Comput. 26 (2014), 7, 1386-1407. | DOI | MR

[32] Montúfar, G., Ghazi-Zahedi, K., Ay, N.: A theory of cheap control in embodied systems. PLOS Comput. Biology 11 (2015), 9, 1-22. | DOI

[33] Montúfar, G., Ghazi-Zahedi, K., Ay, N.: Information theoretically aided reinforcement learning for embodied agents. arXiv preprint arXiv:1605.09735, 2016.

[34] Montúfar, G., Rauh, J., Ay, N.: Expressive power and approximation errors of restricted Boltzmann machines. In: Advances in Neural Information Processing Systems 2011, pp. 415-423.

[35] Montúfar, G., Rauh, J., Ay, N.: Maximal information divergence from statistical models defined by neural networks. In: Geometric Science of Information GSI 2013 (F. Nielsen and F. Barbaresco, eds.), Lecture Notes in Computer Science 3085 Springer 2013, pp. 759-766. | DOI | MR

[36] Rauh, J.: Finding the maximizers of the information divergence from an exponential family. IEEE Trans. Inform. Theory 57 (2011), 6, 3236-3247. | DOI | MR

[37] Rauh, J.: Finding the Maximizers of the Information Divergence from an Exponential Family. PhD. Thesis, Universität Leipzig 2011. | MR

[38] Ince, R. A. A., Quantities, S. Panzeri, Schultz, S. R.: Summary of Information Theoretic. New York, pages 1-6, Springer, 2013.

[39] Roulston, M. S.: Estimating the errors on measured entropy and mutual information. Physica D: Nonlinear Phenomena 125 (1999), 3-4, 285-294. | DOI

[40] Schossau, J., Adami, C., Hintze, A.: Information-theoretic neuro-correlates boost evolution of cognitive systems. Entropy 18 (2015), 1, 6. | DOI

[41] Slonim, N., Atwal, G. S., Tkacik, G., Bialek, W.: Estimating mutual information and multi-information in large networks. arXiv preprint cs/0502017, 2005.

[42] Slonim, N., Friedman, N., Tishby, N.: Multivariate information bottleneck. Neural Comput. 18 (2006), 8, 1739-1789. | DOI | MR

[43] Still, S., Precup, D.: An information-theoretic approach to curiosity-driven reinforcement learning. Theory Biosci. 131 (2012), 3, 139-148. | DOI

[44] Developers, The Sage: SageMath, the Sage Mathematics Software System (Version 8.7), 2019. https://www.sagemath.org

[45] Tishby, N., Pereira, F. C., Bialek, W.: The information bottleneck method. In: Proc. 37th Annual Allerton Conference on Communication, Control and Computing 1999, pp. 368-377.

[46] Vergara, J. R., Estévez, P. A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24 (2014), 1, 175-186. | DOI

[47] Watanabe, S.: Information theoretical analysis of multivariate correlation. IBM J. Res. Develop. 4 (1960), 1, 66-82. | DOI | MR

[48] Witsenhausen, H. S., Wyner, A. D.: A conditional entropy bound for a pair of discrete random variables. IEEE Trans. Inform. Theory 21 (1075), 5, 493-501. | DOI | MR

[49] Yemelichev, V., Kovalev, M., Kravtsov, M.: Polytopes, Graphs and Optimisation. Cambridge University Press, 1984. | MR

[50] Zahedi, K., Ay, N., Der, R.: Higher coordination with less control: A result of information maximization in the sensorimotor loop. Adaptive Behavior 18 (2010), 3-4, 338-355. | DOI

[51] Zahedi, K., Martius, G., Ay, N.: Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis. Front. Psychol. (2013), 4, 801. | DOI

Cité par Sources :

Parcourir par

Geodesic

Parcourir par