Feature selection based on statistical estimation of mutual information
Sibirskie èlektronnye matematičeskie izvestiâ, Tome 18 (2021) no. 1, pp. 720-728.

Voir la notice de l'article provenant de la source Math-Net.Ru

An algorithm to identify significant factors is proposed in the mixed model framework. It employs statistical estimation of mutual information. Consistency of this procedure is established. Numerical experiments demonstrating its accuracy supplement theoretical results.
Keywords: feature selection, mixed model, mutual information, conditional Shannon entropy, logistic regression.
@article{SEMR_2021_18_1_a10,
     author = {A. A. Kozhevin},
     title = {Feature selection based on statistical estimation of mutual information},
     journal = {Sibirskie \`elektronnye matemati\v{c}eskie izvesti\^a},
     pages = {720--728},
     publisher = {mathdoc},
     volume = {18},
     number = {1},
     year = {2021},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/SEMR_2021_18_1_a10/}
}
TY  - JOUR
AU  - A. A. Kozhevin
TI  - Feature selection based on statistical estimation of mutual information
JO  - Sibirskie èlektronnye matematičeskie izvestiâ
PY  - 2021
SP  - 720
EP  - 728
VL  - 18
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/SEMR_2021_18_1_a10/
LA  - en
ID  - SEMR_2021_18_1_a10
ER  - 
%0 Journal Article
%A A. A. Kozhevin
%T Feature selection based on statistical estimation of mutual information
%J Sibirskie èlektronnye matematičeskie izvestiâ
%D 2021
%P 720-728
%V 18
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/SEMR_2021_18_1_a10/
%G en
%F SEMR_2021_18_1_a10
A. A. Kozhevin. Feature selection based on statistical estimation of mutual information. Sibirskie èlektronnye matematičeskie izvestiâ, Tome 18 (2021) no. 1, pp. 720-728. http://geodesic.mathdoc.fr/item/SEMR_2021_18_1_a10/

[1] V. Bolón-Candedo, A. Alonso-Betanzos, Recent advances in ensembles for feature selection, Springer, 2018

[2] A. Bulinski, A. Kozhevin, “Statistical estimation of conditional Shannon entropy”, ESAIM, Probab. Stat., 23 (2019), 350–386 | DOI | MR | Zbl

[3] A. Bulinski, A. Kozhevin, “Statistical Estimation of Mutual Information for Mixed Model”, Methodol. Comput. Appl. Probab., 23:2 (2021), 123–142 | DOI | MR

[4] F. Coelho, A.P. Braga, M.A. Verleysen, “Mutual Information estimator for continuous and discrete variables applied to Feature Selection and Classification problems”, International Journal of Computational Intelligence Systems, 9:4 (2016), 726–733 | DOI

[5] W. Gao, S. Kannan, S. Oh, P. Viswanath, “Estimating mutual information for discrete-continuous mixtures”, 31-st Conference on Neural Information Processing Systems (NIPS 2017) (Long Beach, CA, USA, 2017), 1–12

[6] D.G. Kleinbaum, M. Klein, Logistic regression. A self-learning text, With contributions by Erica Rihl Pryor, 3rd ed., Springer, New York, 2010 | Zbl

[7] M. Kuhn, K. Johnson, Feature engineering and selection: A practical approach for predictive models, CRC Press, Boca Raton, 2020 | MR

[8] S. Kullback, R.A. Leibler, “On information and sufficiency”, Ann. Math. Stat., 22:1 (1951), 79–86 | DOI | MR | Zbl

[9] F. Macedo, R. Oliveira, A. Pacheco, R. Valadas, “Theoretical foundations of forward feature selection methods based on mutual information”, Neurocomputing, 325 (2019), 67–89 | DOI

[10] L. Massaron, A. Boschetti, Regression analysis with Python, Packt Publishing Ltd., Birmingham, 2016

[11] U. Stańczyk, B. Zeilosko, L.C. Jain (eds.), Advances in Feature Selection for Data and Pattern Recognition, Springer, 2018 | MR | Zbl

[12] J.R. Vergara, P.A. Estévez, “A review of feature selection methods based on mutual information”, Neural Comput. and Applic., 24 (2014), 175–186 | DOI