Generalized minimizers of convex integral functionals, Bregman distance, Pythagorean identities
Kybernetika, Tome 48 (2012) no. 4, pp. 637-689 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

Integral functionals based on convex normal integrands are minimized subject to finitely many moment constraints. The integrands are finite on the positive and infinite on the negative numbers, strictly convex but not necessarily differentiable. The minimization is viewed as a primal problem and studied together with a dual one in the framework of convex duality. The effective domain of the value function is described by a conic core, a modification of the earlier concept of convex core. Minimizers and generalized minimizers are explicitly constructed from solutions of modified dual problems, not assuming the primal constraint qualification. A generalized Pythagorean identity is presented using Bregman distance and a correction term for lack of essential smoothness in integrands. Results are applied to minimization of Bregman distances. Existence of a generalized dual solution is established whenever the dual value is finite, assuming the dual constraint qualification. Examples of ‘irregular’ situations are included, pointing to the limitations of generality of certain key results.
Integral functionals based on convex normal integrands are minimized subject to finitely many moment constraints. The integrands are finite on the positive and infinite on the negative numbers, strictly convex but not necessarily differentiable. The minimization is viewed as a primal problem and studied together with a dual one in the framework of convex duality. The effective domain of the value function is described by a conic core, a modification of the earlier concept of convex core. Minimizers and generalized minimizers are explicitly constructed from solutions of modified dual problems, not assuming the primal constraint qualification. A generalized Pythagorean identity is presented using Bregman distance and a correction term for lack of essential smoothness in integrands. Results are applied to minimization of Bregman distances. Existence of a generalized dual solution is established whenever the dual value is finite, assuming the dual constraint qualification. Examples of ‘irregular’ situations are included, pointing to the limitations of generality of certain key results.
Classification : 49J53, 49K30, 62B10, 65K10, 90C46, 94A17
Keywords: maximum entropy; moment constraint; generalized primal/dual solutions; normal integrand; minimizing sequence; convex duality; Bregman projection; conic core; generalized exponential family; inference principles
@article{KYB_2012_48_4_a3,
     author = {Csisz\'ar, Imre and Mat\'u\v{s}, Franti\v{s}ek},
     title = {Generalized minimizers of convex integral functionals, {Bregman} distance, {Pythagorean} identities},
     journal = {Kybernetika},
     pages = {637--689},
     year = {2012},
     volume = {48},
     number = {4},
     mrnumber = {3013394},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2012_48_4_a3/}
}
TY  - JOUR
AU  - Csiszár, Imre
AU  - Matúš, František
TI  - Generalized minimizers of convex integral functionals, Bregman distance, Pythagorean identities
JO  - Kybernetika
PY  - 2012
SP  - 637
EP  - 689
VL  - 48
IS  - 4
UR  - http://geodesic.mathdoc.fr/item/KYB_2012_48_4_a3/
LA  - en
ID  - KYB_2012_48_4_a3
ER  - 
%0 Journal Article
%A Csiszár, Imre
%A Matúš, František
%T Generalized minimizers of convex integral functionals, Bregman distance, Pythagorean identities
%J Kybernetika
%D 2012
%P 637-689
%V 48
%N 4
%U http://geodesic.mathdoc.fr/item/KYB_2012_48_4_a3/
%G en
%F KYB_2012_48_4_a3
Csiszár, Imre; Matúš, František. Generalized minimizers of convex integral functionals, Bregman distance, Pythagorean identities. Kybernetika, Tome 48 (2012) no. 4, pp. 637-689. http://geodesic.mathdoc.fr/item/KYB_2012_48_4_a3/

[1] S. M. Ali, S. D. Silvey: A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. Ser. B 28 (1966) 131-142. | MR | Zbl

[2] S. Amari, H. Nagaoka: Methods of Information Geometry. Transl. Math. Monographs 191, Oxford Univ. Press, 2000. | MR | Zbl

[3] S. Amari, A. Cichocki: Information geometry of divergence functions. Bull. Polish Acad. Sci. 58 (2010) 183-194.

[4] O. Barndorff-Nielsen: Information and Exponential Families in Statistical Theory. Wiley, 1978. | MR | Zbl

[5] H. H. Bauschke, J. M. Borwein: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4 (1997), 27-67. | MR | Zbl

[6] H. H. Bauschke, J. M. Borwein, P. L. Combettes: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Comm. Contemp. Math. 3 (2001), 615-647. | DOI | MR | Zbl

[7] A. Ben-Tal, A. Charnes: A dual optimization framework for some problems of information theory and statistics. Problems Control Inform. Theory 8 (1979), 387-401. | MR | Zbl

[8] J. M. Borwein, A. S. Lewis: Duality relationships for entropy-like minimization problems. SIAM J. Control Optim. 29 (1991), 325-338. | DOI | MR | Zbl

[9] J. M. Borwein, A. S. Lewis: Convergence of best entropy estimates. SIAM J. Optim. 1 (1991), 191-205. | DOI | MR | Zbl

[10] J. M. Borwein, A. S. Lewis: Partially-finite programming in $L_1$ and the existence of maximum entropy estimates. SIAM J. Optim. 3 (1993), 248-267. | DOI | MR

[11] J. M. Borwein, A. S. Lewis, D. Noll: Maximum entropy spectral analysis using derivative information. Part I: Fisher information and convex duality. Math. Oper. Res. 21 (1996), 442-468. | DOI | MR

[12] L. M. Bregman: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. and Math. Phys. 7 (1967), 200-217. | DOI | MR | Zbl

[13] M. Broniatowski, A. Keziou: Minimization of $\phi$-divergences on sets of signed measures. Studia Sci. Math. Hungar. 43 (2006), 403-442. | MR | Zbl

[14] J. P. Burg: Maximum entropy spectral analysis. Paper presented at 37th Meeting of Soc. Explor. Geophysicists, Oklahoma City 1967.

[15] J. P. Burg: Maximum entropy spectral analysis. Ph.D. Thesis, Dept. Geophysics, Stanford Univ., Stanford 1975.

[16] Y. Censor, S. A. Zenios: Parallel Optimization. Oxford University Press, New York 1997. | MR | Zbl

[17] N. N. Chentsov: Statistical Decision Rules and Optimal Inference. Transl. Math. Monographs 53, American Math. Soc., Providence 1982. Russian original: Nauka, Moscow 1972. | MR | Zbl

[18] I. Csiszár: Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. 8 (1963), 85-108. | MR | Zbl

[19] I. Csiszár: Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 2 (1967), 299-318. | MR | Zbl

[20] I. Csiszár: $I$-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3 (1975), 146-158. | DOI | MR

[21] I. Csiszár: Sanov property, generalized $I$-projection and a conditional limit theorem. Ann. Probab. 12 (1984), 768-793. | DOI | MR | Zbl

[22] I. Csiszár: Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Statist. 19 (1991), 2031-2066. | DOI | MR | Zbl

[23] I. Csiszár: Generalized projections for non-negative functions. Acta Math. Hungar. 68 (1995), 1-2, 161-185. | DOI | MR | Zbl

[24] I. Csiszár, F. Gamboa, E. Gassiat: MEM pixel correlated solutions for generalized moment and interpolation problems. IEEE Trans. Inform. Theory 45 (1999), 2253-2270. | DOI | MR | Zbl

[25] I. Csiszár, F. Matúš: Convex cores of measures on $\mathcal{R}^d$. Studia Sci. Math. Hungar. 38 (2001), 177-190. | MR

[26] I. Csiszár, F. Matúš: Information projections revisited. IEEE Trans. Inform. Theory 49 (2003), 1474-1490. | DOI | MR | Zbl

[27] I. Csiszár, F. Matúš: Generalized maximum likelihood estimates for infinite dimensional exponential families. In: Proc. Prague Stochastics'06, Prague 2006, pp. 288-297.

[28] I. Csiszár, F. Matúš: Generalized maximum likelihood estimates for exponential families. Probab. Theory Related Fields 141 (2008), 213-246. | DOI | MR | Zbl

[29] I. Csiszár, F. Matúš: On minimization of entropy functionals under moment constraints. In: Proc. ISIT 2008, Toronto, pp. 2101-2105.

[30] I. Csiszár, F. Matúš: On minimization of multivariate entropy functionals. In: Proc. ITW 2009, Volos, Greece, pp. 96-100.

[31] I. Csiszár, F. Matúš: Minimization of entropy functionals revisited. In: Proc. ISIT 2012, Cambridge, MA, pp. 150-154.

[32] D. Dacunha-Castelle, F. Gamboa: Maximum d'entropie et problème des moments. Ann. Inst. H. Poincaré Probab. Statist. 26 (1990), 567-596. | MR | Zbl

[33] A. P. Dawid, P. D. Grünwald: Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. Ann. Statist. 32 (2004), 1367-1433. | DOI | MR | Zbl

[34] S. Eguchi: Information geometry and statistical pattern recognition. Sugaku Expositions, Amer. Math. Soc. 19 (2006), 197-216. | MR

[35] B. A. Frigyik, S. Srivastava, M. R. Gupta: Functional Bregman divergence and Bayesian estimation of distributions. IEEE Trans. Inform. Theory 54 (2008), 5130-5139. | DOI | MR

[36] F. Gamboa, E. Gassiat: Bayesian methods and maximum entropy for ill-posed inverse problems. Ann. Statist. 25 (1997), 1, 328-350. | DOI | MR | Zbl

[37] E. T. Jaynes: Information theory and statistical mechanics. Physical Review Ser. II 106 (1957), 620-630. | MR | Zbl

[38] L. Jones, C. Byrne: General entropy criteria for inverse problems with application to data compression, pattern classification and cluster analysis. IEEE Trans. Inform. Theory 36 (1990), 23-30. | DOI | MR

[39] S. Kullback: Information Theory and Statistics. John Wiley and Sons, New York 1959. | MR | Zbl

[40] S. Kullback, R. A. Leibler: On information and sufficiency. Ann. Math. Statist. 22 (1951), 79-86. | DOI | MR | Zbl

[41] C. Léonard: Minimizers of energy functionals. Acta Math. Hungar. 93 (2001), 281-325. | DOI | MR | Zbl

[42] C. Léonard: Minimizers of energy functionals under not very integrable constraints. J. Convex Anal. 10 (2003), 63-68. | MR

[43] C. Léonard: Minimization of entropy functionals. J. Math. Anal. Appl. 346 (2008), 183-204. | DOI | MR | Zbl

[44] C. Léonard: Entropic projections and dominating points. ESAIM: Probability and Statistics 14 (2010), 343-381. | DOI | MR | Zbl

[45] F. Liese, I. Vajda: Convex Statistical Distances. Teubner Texte zur Mathematik 95, Teubner Verlag, Leipzig 1986. | MR | Zbl

[46] N. Murata, T. Takenouchi, T. Kanamori, S. Eguchi: Information geometry of U-Boost and Bregman divergence. Neural Computation 16 (2004), 1437-1481. | DOI | Zbl

[47] R. T. Rockafellar: Integrals which are convex functionals. Pacific J. Math. 24 (1968), 525-539. | DOI | MR | Zbl

[48] R. T. Rockafellar: Convex integral functionals and duality. In: Contributions to Nonlinear Functional Analysis (E. H. Zarantonello, ed.), Academic Press, New York 1971, pp. 215-236. | MR | Zbl

[49] R. T. Rockafellar: Convex Analysis. Princeton University Press, Princeton 1970. | MR | Zbl

[50] R. T. Rockafellar, R. J.-B. Wets: Variational Analysis. Springer Verlag, Berlin - Heidel\-berg - New York 2004. | MR | Zbl

[51] M. Teboulle, I. Vajda: Convergence of best $\phi$-entropy estimates. IEEE Trans. Inform. Theory 39 (1993), 297-301. | DOI | MR | Zbl

[52] F. Topsoe: Information-theoretical optimization techniques. Kybernetika 15 (1979), 8-27. | MR

[53] I. Vajda: Theory of Statistical Inference and Information. Kluwer Academic Puplishers, Dordrecht 1989. | Zbl