An efficient method for feature selection in linear regression based on an extended Akaike's information criterion
Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki, Tome 49 (2009) no. 11, pp. 2066-2080 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

A method for feature selection in linear regression based on an extension of Akaike's information criterion is proposed. The use of classical Akaike's information criterion (AIC) for feature selection assumes the exhaustive search through all the subsets of features, which has unreasonably high computational and time cost. A new information criterion is proposed that is a continuous extension of AIC. As a result, the feature selection problem is reduced to a smooth optimization problem. An efficient procedure for solving this problem is derived. Experiments show that the proposed method enables one to efficiently select features in linear regression. In the experiments, the proposed procedure is compared with the relevance vector machine, which is a feature selection method based on Bayesian approach. It is shown that both procedures yield similar results. The main distinction of the proposed method is that certain regularization coefficients are identical zeros. This makes it possible to avoid the underfitting effect, which is a characteristic feature of the relevance vector machine. A special case (the so-called nondiagonal regularization) is considered in which both methods are identical.
@article{ZVMMF_2009_49_11_a12,
     author = {D. P. Vetrov and D. A. Kropotov and N. O. Ptashko},
     title = {An efficient method for feature selection in linear regression based on an extended {Akaike's} information criterion},
     journal = {\v{Z}urnal vy\v{c}islitelʹnoj matematiki i matemati\v{c}eskoj fiziki},
     pages = {2066--2080},
     year = {2009},
     volume = {49},
     number = {11},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/ZVMMF_2009_49_11_a12/}
}
TY  - JOUR
AU  - D. P. Vetrov
AU  - D. A. Kropotov
AU  - N. O. Ptashko
TI  - An efficient method for feature selection in linear regression based on an extended Akaike's information criterion
JO  - Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki
PY  - 2009
SP  - 2066
EP  - 2080
VL  - 49
IS  - 11
UR  - http://geodesic.mathdoc.fr/item/ZVMMF_2009_49_11_a12/
LA  - ru
ID  - ZVMMF_2009_49_11_a12
ER  - 
%0 Journal Article
%A D. P. Vetrov
%A D. A. Kropotov
%A N. O. Ptashko
%T An efficient method for feature selection in linear regression based on an extended Akaike's information criterion
%J Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki
%D 2009
%P 2066-2080
%V 49
%N 11
%U http://geodesic.mathdoc.fr/item/ZVMMF_2009_49_11_a12/
%G ru
%F ZVMMF_2009_49_11_a12
D. P. Vetrov; D. A. Kropotov; N. O. Ptashko. An efficient method for feature selection in linear regression based on an extended Akaike's information criterion. Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki, Tome 49 (2009) no. 11, pp. 2066-2080. http://geodesic.mathdoc.fr/item/ZVMMF_2009_49_11_a12/

[1] Tipping M. E., “The relevance vector machine”, Advances Neural Information Processing Systems, 12 (2000), 652–658

[2] MacKay D. J. C., “The evidence framework applied to classification networks”, Neural Comput., 4 (1992), 720–736 | DOI

[3] Tibshirani R., “Regression shrinkage and selection via the lasso”, J. Roy. Stat. Soc., 58 (1996), 267–288 | MR | Zbl

[4] Figueiredo M., “Adaptive sparseness for supervised learning”, IEEE Trans. Pattern Analys. Mach. Intelligence, 25 (2003), 1150–1159 | DOI

[5] Williams P. M., “Bayesian regularization and pruning using a laplace prior”, Neural Comput., 7 (1995), 117–143 | DOI

[6] Cawley G. C., Talbot N. L. C., Girolami M., “Sparse multinomial logistic regression via bayesian $L1$ regularisation”, Advances Neural Informat. Processing Systems, 19 (2007), 209–216

[7] Schwarz G., “Estimating the dimension of a model”, Ann. Statistics, 6 (1978), 461–464 | DOI | MR | Zbl

[8] Bishop C. M., Pattern recognition and machine learning, Springer, New York, 2006 | MR

[9] Akaike H., “A new look at statistical model identification”, IEEE Trans. Automatic Control, 25 (1974), 461–464 | MR

[10] Shiryaev A. N., Veroyatnost, Nauka, M., 1979 | MR

[11] Borovkov A. A., Matematicheskaya statistika, Fizmatlit, M., 2007 | Zbl

[12] Khorn R., Dzhonson Ch., Matrichnyi analiz, Mir, M., 1989 | MR

[13] Spiegelhalter D., Best N., Carlin B., van der Linde A., “Bayesian measures of model complexity and fit”, J. Roy. Statist. Soc., 64 (2002), 583–640 | DOI | MR

[14] Faul A. C., Tipping M. E., “Analysis of sparse bayesian learning”, Advances Neural Informat. Processing Systems, 14 (2002), 383–389

[15] Tipping M. E., “Sparse bayesian learning and the relevance vector machines”, J. Mach. Learning Res., 1 (2001), 211–244 | DOI | MR | Zbl

[16] Kropotov D. A., Vetrov D. P., “On one method of non-diagonal regularization in sparse bayesian learning”, Proc. 24th Internat. Conf. Mach. Learning, Omnipress, Corvalis, 2007, 457–464

[17] Qi Y., Minka T., Picard R., Ghahramani Z., “Predictive automatic relevance determination by expectation propagation”, Proc. 21st Internat. Conf. Mach. Learning, Omnipress, Banff, 2004, 671–678

[18] Dietterich T. G., “Approximate statistical tests for comparing supervised classification learning algorithms”, Neural Comput., 10 (1998), 1895–1924 | DOI