On relevant features selection based on information theory
Teoriâ veroâtnostej i ee primeneniâ, Tome 68 (2023) no. 3, pp. 483-508 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

It is shown that widely used suboptimal algorithms of feature selection based on information theory concepts do not necessarily identify a collection of features (relevant in a sense) affecting the studied random response. This can be considered as a reflection of the epistasis phenomenon known in genetics, when individual features have little effect on increased risk of complex disease, whereas certain combinations of features have significant impact on risk. It is demonstrated that a similar effect is also manifested in inferences employing statistical estimates of mutual information.
Keywords: feature selection, mutual information, sequential selection of features, epistasis effect.
Mots-clés : interaction information
@article{TVP_2023_68_3_a3,
     author = {A. V. Bulinski},
     title = {On relevant features selection based on information theory},
     journal = {Teori\^a vero\^atnostej i ee primeneni\^a},
     pages = {483--508},
     year = {2023},
     volume = {68},
     number = {3},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/TVP_2023_68_3_a3/}
}
TY  - JOUR
AU  - A. V. Bulinski
TI  - On relevant features selection based on information theory
JO  - Teoriâ veroâtnostej i ee primeneniâ
PY  - 2023
SP  - 483
EP  - 508
VL  - 68
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/TVP_2023_68_3_a3/
LA  - ru
ID  - TVP_2023_68_3_a3
ER  - 
%0 Journal Article
%A A. V. Bulinski
%T On relevant features selection based on information theory
%J Teoriâ veroâtnostej i ee primeneniâ
%D 2023
%P 483-508
%V 68
%N 3
%U http://geodesic.mathdoc.fr/item/TVP_2023_68_3_a3/
%G ru
%F TVP_2023_68_3_a3
A. V. Bulinski. On relevant features selection based on information theory. Teoriâ veroâtnostej i ee primeneniâ, Tome 68 (2023) no. 3, pp. 483-508. http://geodesic.mathdoc.fr/item/TVP_2023_68_3_a3/

[1] A. V. Bulinski, A. N. Kolmogorov, “Linear sampling estimations of sums”, Theory Probab. Appl., 24:2 (1979), 241–252 | DOI | MR | Zbl

[2] V. Bolón-Candedo, A. Alonso-Betanzos, Recent advances in ensembles for feature selection, Intell. Syst. Ref. Libr., 147, Springer, Cham, 2018, xiv+205 pp. | DOI

[3] Jundong Li, Kewei Cheng, Suhang Wang, F. Morstatter, R. P. Trevino, Jiliang Tang, Huan Liu, “Feature selection: a data perspective”, ACM Comput. Surveys, 50:6 (2018), 94, 45 pp. | DOI

[4] Advances in feature selection for data and pattern recognition, Intell. Syst. Ref. Libr., 138, eds. U. Stańczyk, B. Zielosko, L. C. Jain, Springer, Cham, 2018, xviii+328 pp. | DOI | MR | Zbl

[5] R. Battiti, “Using mutual information for selecting features in supervised neural net learning”, IEEE Trans. Neural Networks, 5:4 (1994), 537–550 | DOI

[6] W. J. McGill, “Multivariate information transmission”, Trans. IRE PGIT-4, 4:4 (1954), 93–111 | DOI | MR

[7] R. M. Fano, Transmission of information. A statistical theory of communications, The M.I.T. Press, Cambridge, MA; John Wiley Sons, Inc., 1961, x+389 pp. | MR | MR | Zbl | Zbl

[8] Te Sun Han, “Multiple mutual informations and multiple interactions in frequency data”, Inform. and Control, 46:1 (1980), 26–45 | DOI | MR | Zbl

[9] J. Mielniczuk, “Information theoretic methods for variable selection – a review”, Entropy, 24:8 (2022), 1079, 25 pp. | DOI | MR

[10] Hanchuan Peng, Fuhui Long, C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy”, IEEE Trans. Pattern Anal. Mach. Intell., 27:8 (2005), 1226–1238 | DOI

[11] G. Brown, “A new perspective for information theoretic feature selection”, Proceedings of the 12th international conference on artificial intelligence and statistics (AISTATS) 2009 (Clearwater Beach, FL, 2009), Proceedings of Machine Learning Research (PMLR), 5, 2009, 49–56 http://proceedings.mlr.press/v5/brown09a.html

[12] V. A. Kovalevsky, “The problem of character recognition from the point of view of mathematical statistics”, Character readers and pattern recognition, Spartan, New York, 1968, 3–30 | Zbl

[13] T. Tsunoda, T. Tanaka, Y. Nakamura (Eds.), Genome-wide association studies, Springer, Singapore, 2019, 328 pp.

[14] E. P. Wigner, “The unreasonable effectiveness of mathematics in the natural sciences”, Comm. Pure Appl. Math., 13:1 (1960), 1–14 | DOI | MR | Zbl

[15] T. Hastie, R. Tibshirani, M. Wainwrigth, Statistical learning with sparsity. The lasso and generalizations, Monogr. Statist. Appl. Probab., 143, CRC Press, Boca Raton, FL, 2015, xv+351 pp. | DOI | MR | Zbl

[16] F. Macedo, M. Rosário Oliveira, A. Pacheco, R. Valadas, “Theoretical foundations of forward feature selection methods based on mutual information”, Neurocomputing, 325 (2019), 67–89 | DOI

[17] J. R. Vergara, P. A. Estévez, “A review of feature selection methods based on mutual information”, Neural. Comput. Appl., 24 (2014), 175–186 | DOI

[18] R. D. Shah, R. J. Samworth, “Variable selection with error control: another look at stability selection”, J. R. Stat. Soc. Ser. B. Stat. Methodol., 75:1 (2013), 55–80 | DOI | MR | Zbl

[19] P. Bugata, P. Drotar, “On some aspects of minimum redundancy maximum relevance feature selection”, Sci. China Inf. Sci., 63:1 (2020), 112103, 15 pp. | DOI | MR

[20] A. Bulinski, A. Kozhevin, “New version of the MDR method for stratified samples”, Stat. Optim. Inf. Comput., 5:1 (2017), 1–18 | DOI | MR

[21] O. Kallenberg, Foundations of modern probability, Probab. Appl. (N. Y.), Springer-Verlag, New York, 1997, xii+523 pp. | MR | Zbl

[22] Zhiyi Zhang, Statistical implications of Turing's formula, John Wiley Sons, Inc., Hoboken, NJ, 2017, xiv+282 pp. | DOI | MR | Zbl

[23] A. Bulinski, A. Kozhevin, “Statistical estimation of mutual information for mixed model”, Methodol. Comput. Appl. Probab., 23:1 (2021), 123–142 | DOI | MR | Zbl

[24] A. V. Bulinskii, A. N. Shiryaev, Teoriya sluchainykh protsessov, 2-e izd., Fizmatlit, M., 2005, 400 pp.

[25] S. Nogueira, K. Sechidis, G. Brown, “On the stability of feature selection algorithms”, J. Mach. Learn. Res., 18 (2017), 174, 54 pp. | MR | Zbl

[26] D. Edelmann, T. F. Móri, G. J. Székely, “On relationships between the Pearson and the distance correlation coefficients”, Statist. Probab. Lett., 169 (2021), 108960, 6 pp. | DOI | MR | Zbl

[27] H. H. Yang, J. Moody, “Data visualization and feature selection: New algorithms for nongaussian data”, NIPS'99: Proceedings of the 12th international conference on neural information processing systems, Adv. Neural Inf. Process. Syst., 12, MIT Press, Cambridge, MA, 1999, 687–693

[28] N. Kwak, Chong-Ho Choi, “Input feature selection for classification problems”, IEEE Trans. Neural Netw., 13:1 (2002), 143–159 | DOI

[29] M. Vidal-Naquet, S. Ullman, “Object recognition with informative features and linear classification”, Proceedings. Ninth IEEE international conference on computer vision (Nice, 2003), v. 1, IEEE, 2003, 281–288 | DOI

[30] F. Fleuret, “Fast binary feature selection with conditional mutual information”, J. Mach. Learn. Res., 5 (2003/04), 1531–1555 | MR | Zbl

[31] Dahua Lin, Xiaoou Tang, “Conditional infomax learning: an integrated framework for feature extraction and fusion.”, ECCV'06: Proceedings of the 9th European conference on computer vision, Part I (Graz, 2006), Springer-Verlag, Berlin, 2006, 68–82 | DOI

[32] Insik Jo, Sangbum Lee, Sejong Oh, “Improved measures of redundancy and relevance for mRMR feature selection”, Computers, 8:2 (2019), 42, 14 pp. | DOI

[33] Jimin Lee, Nomin Batnyam, Sejong Oh, “RFS: efficient feature selection method based on $R$-value”, Comput. Biol. Med., 43:2 (2013), 91–99 | DOI

[34] Nguyen Xuan Vinh, Shuo Zhou, J. Chan, J. Bailey, “Can high-order dependencies improve mutual information based feature selection?”, Pattern Recognit., 53 (2016), 45–58 | DOI | Zbl

[35] A. A. Kozhevin, “Feature selection based on statistical estimation of mutual information”, Sib. elektron. matem. izv., 18:1 (2021), 720–728 | DOI | MR | Zbl

[36] A. Bulinski, A. Kozhevin, “Statistical estimation of conditional Shannon entropy”, ESAIM Probab. Stat., 23 (2019), 350–386 | DOI | MR | Zbl

[37] M. Beraha, A. M. Metelliy, M. Papiniy, A. Tirinzoni, M. Restelli, “Feature selection via mutual information: new theoretical insights”, IJCNN 2019: International joint conference on neural networks (Budapest, 2019), IEEE, 2019, 19832, 9 pp. | DOI

[38] J. Mielniczuk, P. Teisseyre, “Stopping rules for mutual information-based feature selection”, Neurocomputing, 358 (2019), 255–274 | DOI

[39] Kui Yu, Lin Liu, Jiuyong Li, “A unified view of causal and non-causal feature selection”, ACM Trans. Knowl. Discov. Data, 15:4 (2021), 63, 46 pp. | DOI

[40] M. Kubkowski, J. Mielniczuk, “Asymptotic distributions of empirical interaction information”, Methodol. Comput. Appl. Probab., 23:1 (2021), 291–315 | DOI | MR | Zbl

[41] Jun Liang, Liang Hou, Zhenhua Luan, Weiping Huang, “Feature selection with conditional mutual information considering feature interaction”, Symmetry, 11:7 (2019), 858, 17 pp. | DOI

[42] G. Manikandan, S. Abirami, “An efficient feature selection framework based on information theory for high dimensional data”, Appl. Soft Comput., 111 (2021), 107729, 25 pp. | DOI

[43] M. Radovic, M. Ghalwash, N. Filipovic, Z. Obradovic, “Minimum redundancy maximum relevance feature selection approach for temporal gene expression data”, BMC Bioinformatics, 18 (2017), 9, 14 pp. | DOI

[44] Quan Zou, Kaiyang Qu, Yamei Luo, Dehui Yin, Ying Ju, Hua Tang, “Predicting diabetes mellitus with machine learning techniques”, Front. Genet., 9 (2018), 515, 10 pp. | DOI

[45] M. Phogat, D. Kumar, “Disease single nucleotide polymorphism selection using hybrid feature selection technique”, J. Phys. Conf. Ser., 1950 (2021), 012079, 10 pp. | DOI