A backward selection procedure for approximating a discrete probability distribution by decomposable models
Kybernetika, Tome 48 (2012) no. 5, pp. 825-844 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal{H}$ is the generating hypergraph of a decomposable model and $p_{\mathcal{H}}$ is the estimate of $p$ under the model, we can measure the closeness of $p_{\mathcal{H}}$ to $p$ by the information divergence $D(p: p_{\mathcal{H}})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph ${\mathcal{H}}$ of tree-width $k$ such that $D(p: p_{\mathcal{H}})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a `good' solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$.
Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal{H}$ is the generating hypergraph of a decomposable model and $p_{\mathcal{H}}$ is the estimate of $p$ under the model, we can measure the closeness of $p_{\mathcal{H}}$ to $p$ by the information divergence $D(p: p_{\mathcal{H}})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph ${\mathcal{H}}$ of tree-width $k$ such that $D(p: p_{\mathcal{H}})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a `good' solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$.
Classification : 05C65, 62-09, 68R10, 68T05
Keywords: backward selection; information divergence; decomposable model; acyclic hypergraph; $k$-hypertree
@article{KYB_2012_48_5_a0,
     author = {Malvestuto, Francesco M.},
     title = {A backward selection procedure for approximating a discrete probability distribution by decomposable models},
     journal = {Kybernetika},
     pages = {825--844},
     year = {2012},
     volume = {48},
     number = {5},
     mrnumber = {3086854},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2012_48_5_a0/}
}
TY  - JOUR
AU  - Malvestuto, Francesco M.
TI  - A backward selection procedure for approximating a discrete probability distribution by decomposable models
JO  - Kybernetika
PY  - 2012
SP  - 825
EP  - 844
VL  - 48
IS  - 5
UR  - http://geodesic.mathdoc.fr/item/KYB_2012_48_5_a0/
LA  - en
ID  - KYB_2012_48_5_a0
ER  - 
%0 Journal Article
%A Malvestuto, Francesco M.
%T A backward selection procedure for approximating a discrete probability distribution by decomposable models
%J Kybernetika
%D 2012
%P 825-844
%V 48
%N 5
%U http://geodesic.mathdoc.fr/item/KYB_2012_48_5_a0/
%G en
%F KYB_2012_48_5_a0
Malvestuto, Francesco M. A backward selection procedure for approximating a discrete probability distribution by decomposable models. Kybernetika, Tome 48 (2012) no. 5, pp. 825-844. http://geodesic.mathdoc.fr/item/KYB_2012_48_5_a0/

[1] Almond, R., Kong, A.: Optimality Issues in Constructing a Markov Tree from Graphical Models. Research Report A-3, Dept. Statistics, Harvard University, 1991.

[2] Altmüller, A., Haralick, R. M.: Approximating high dimensional probability disributions. In: Proc. XVII Int. Conf. on Patter Recognitions, 2004.

[3] Altmüller, A., Haralick, R. M.: Practical aspects of efficient forward selection in decomposable graphical models. In: Proc. XVI IEEE Int. Conf. on Tools with Artificial Intelligence, 2004, pp. 710-715.

[4] Bach, F. R., Jordan, M. I.: Thin junction trees. Adv. in Neural Inform. Proces. Systems 14 (2002), 569-572.

[5] Badsberg, J.-H., Malvestuto, F. M.: An implementation of the iterative proportional fitting procedure by propagation trees. Comput. Statist. Data Anal. 37 (2001), 297-322. | DOI | MR | Zbl

[6] Beeri, C., Fagin, R., Maier, D., Yannakakis, M.: On the desirability of acyclic database schemes. J. ACM 30 (1983), 479-513. | DOI | MR | Zbl

[7] Beineke, L. W., Pippert, R. E.: The enumeration of labelled 2-trees. J. Combin. Theory 6 (1969), 200-205. | MR

[8] Bishop, Y., Fienberg, S. E., Holland, P. W.: Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge 1975. | MR | Zbl

[9] Brown, D. T.: A note on approximations to discrete probability distributions. Inform. and Control 2 (1959), 386-392. | DOI | MR | Zbl

[10] Chickering, D.: Learning Bayesian networks is NP-complete. In: Learning from Data, Lecture Notes in Statist. 112 (1996), 121-130. | MR

[11] Chow, C. K., Liu, C. N.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14 (1968), 462-467. | DOI | Zbl

[12] Cover, T. M.: Elements of Information Theory. John Wiley and Sons, 1991. | MR | Zbl

[13] Csiszár, I., Körner, J.: Information Theory. Academic Press, 1981. | MR

[14] Dagum, P., Luby, M.: Approximating probabilistic inference in belief networks is NP-hard. Artificial Intelligence 60 (1993), 141-153. | DOI | MR

[15] Dasgupta, S.: Learning polytrees. In: Proc. XV Conference on Uncertainty in Artificial Intelligence, 1999, pp. 134-141.

[16] Deshpande, A., Garofalakis, M., Jordan, M. I.: Efficient stepwise selection in decomposable models. In: Proc. XVII Conf. on Uncertainty in Artificial Intelligence, 2001, pp. 128-135.

[17] Ding, G., Lax, R. F., Chen, J., Chen, P. P., Marx, B. D.: Comparison of greedy strategies for learning Markov networks of treewidth $k$. In: Proc. Int. Conf. on Machine Learning: Models, Technologies and Applications, 2007, pp. 294-301.

[18] Havránek, T.: On model search methods. In: Proc. IX Symp. on Computational Statistics, 1990, pp. 101-108.

[19] Havránek, T.: Simple formal systems in data analysis. In: Proc. Conf. on Symbolic-Numeric Data Analysis and Learning, 1991, pp. 373-381.

[20] Jensen, F. V., Jensen, F.: Optimal junction trees. In: Proc. X Conf. on Uncertainty in Artificial Intelligence (R. L. de Mantaras and D. Poole, eds.), 1994, pp. 360-366.

[21] Karger, D., Srebro, N.: Learning Markov networks: maximum bounded tree-width graphs. In: Proc. XII ACM-SIAM Symp. on Discrete Mathematics, 2001, pp. 392-401. | MR | Zbl

[22] Kloks, T.: Tree-width. LNCS 842, Springer Verlag, Berlin 1994. | Zbl

[23] Kocka, T.: New algorithm for learning decomposable models. Unpublished manuscript, 2000.

[24] Kovács, E., Szántai, T.: Vine copulas as a mean for the construction of high dimensional probability distribution associated to a Markov network. arXiv:1105.1697v1, 2011.

[25] Ku, H. H., Kullback, S.: Approximating discrete probability distributions. IEEE Trans. Inform. Theory 15 (1969), 444-447. | DOI | MR | Zbl

[26] Lauritzen, S. L.: Graphical Models. Clarendon Press, Oxford 1996. | MR

[27] II, P. M. Lewis: Approximating probability distributions to reduce storage requirements. Inform. and Control 2 (1959), 214-225. | DOI | MR

[28] Malvestuto, F. M.: Operations research in the design of statistical databases (in Italian). In: Proc. AIRO Meeting on Operations Research and Computer Science, 1986, pp. 117-130.

[29] Malvestuto, F. M.: Approximating discrete probability distributions with decomposable models. IEEE Trans. Systems, Man and Cybernetics 21 (1991), 1287-1294. | DOI

[30] Malvestuto, F. M.: An axiomatization of loglinear models with an application to the model search. In: Learning from Data, LNS 112 (1996), pp. 175-184.

[31] Malvestuto, F. M.: Designing a probabilistic database from a given set of full conditional independences. In: Proc. Workshop on Conditional Independence Structures and Graphical Models, 1999.

[32] Malvestuto, F. M.: A hypergraph-theoretic analysis of collapsibility and decomposability for extended log-linear models. Statist. Comput. 11 (2001), 155-169. | DOI | MR

[33] Malvestuto, F. M.: From conditional independences to factorization constraints with discrete random variables. Ann. Math. and Artificial Intelligence 35 (2002), 253-285. | DOI | MR | Zbl

[34] Malvestuto, F. M.: Tree and local computations in a cross-entropy minimization problem with marginal constraints. Kybernetika 46 (2010), 621-654. | MR | Zbl

[35] Meek, C.: Finding a path is harder than finding a tree. J. Artificial Intelligence Res. 15 (2001), 383-389. | MR | Zbl

[36] Mezzini, M., Moscarini, M.: Simple algorithms for minimal triangulation of a graph and backward selection of a decomposable Markov network. Theoret. Comput. Sci. 411 (2010), 958-966. | DOI | MR | Zbl

[37] Nunez, K., Chen, J., Chen, P., Ding, G., Lax, R. F., Marx, B.: Empirical comparison of greedy strategies for learning Markov networks of treewidth $k$. In: Proc. VII Int. Conf. on Machine Learning and Applications, 2008, pp. 106-113.

[38] Rose, J. D.: On simple characterizations of $k$-trees. Discrete Math. 7 (1974), 317-322. | DOI | MR | Zbl

[39] Szántai, T., Kovács, E.: Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution. In: Proc. Conf. on Applied Mathematical Programming and Modelling, 2008; also in Ann. Oper. Res. 193 (2012), 71-90. | MR

[40] Szántai, T., Kovács, E.: Discovering a junction tree behind a Markov network by a greedy algorithm. arXiv:1104.2762v3, 2011.

[41] Tarjan, R. E., Yannakakis, M.: Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce hypergraphs. SIAM J. Comput. 13 (1984), 566-579. | DOI | MR

[42] Wermuth, N.: Analogies between multiplicative models in contingency tables and covariance selection. Biometrics 32 (1976), 95-108. | DOI | MR | Zbl

[43] Wermuth, N.: Model search among multiplicative models. Biometrics 32 (1976), 253-256. | DOI | Zbl

[44] Xiang, Y., Wong, S. K. M., Cercone, N.: A ``microscopic'' study of minimum entropy search in learning decomposable Markov networks. Mach. Learning 26 (1997), 65-72. | DOI | Zbl