Selecting informative features of human gene exons
Journal of the Belarusian State University. Mathematics and Informatics, Tome 1 (2019), pp. 77-89.

Voir la notice de l'article provenant de la source Math-Net.Ru

Dimensionality reduction of the human gene exon feature space is considered with the aim of gene identification. To evaluate the performance of various feature selection algorithms, computational experiments were carried out using the examples of exons of 14 known human genes. It is proven that exons are clearly separable regarding gene affiliation. Feature selection algorithms are sensitive to noise features and allow to estimate their number. Reducing the number of features improves CPU-time, memory usage as well as reduces the complexity of a model and makes it easier to interpret. Our findings indicate that utilizing of features of flanking intronic sequences leads to better prediction models in comparison with utilizing of exon features. The results of the research provide new opportunities for study of human gene data using machine learning algorithms.
Keywords: exon; intron; bioinformatics; feature selection; simulation modeling; classification algorithm.
@article{BGUMI_2019_1_a9,
     author = {A. V. Volkov and N. N. Yatskou and V. V. Grinev},
     title = {Selecting informative features of human gene exons},
     journal = {Journal of the Belarusian State University. Mathematics and Informatics},
     pages = {77--89},
     publisher = {mathdoc},
     volume = {1},
     year = {2019},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/BGUMI_2019_1_a9/}
}
TY  - JOUR
AU  - A. V. Volkov
AU  - N. N. Yatskou
AU  - V. V. Grinev
TI  - Selecting informative features of human gene exons
JO  - Journal of the Belarusian State University. Mathematics and Informatics
PY  - 2019
SP  - 77
EP  - 89
VL  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/BGUMI_2019_1_a9/
LA  - ru
ID  - BGUMI_2019_1_a9
ER  - 
%0 Journal Article
%A A. V. Volkov
%A N. N. Yatskou
%A V. V. Grinev
%T Selecting informative features of human gene exons
%J Journal of the Belarusian State University. Mathematics and Informatics
%D 2019
%P 77-89
%V 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/BGUMI_2019_1_a9/
%G ru
%F BGUMI_2019_1_a9
A. V. Volkov; N. N. Yatskou; V. V. Grinev. Selecting informative features of human gene exons. Journal of the Belarusian State University. Mathematics and Informatics, Tome 1 (2019), pp. 77-89. http://geodesic.mathdoc.fr/item/BGUMI_2019_1_a9/

[1] V. V. Grinev, A. A. Migas, A. D. Kirsanava, O. A. Mishkova, N. Siomava, T. V. Ramanouskaya, “Decoding of exon splicing patterns in the human RUNX1–RUNX1T1 fusion gene”, International Journal of Biochemistry and Cell Biology, 68 (2015), 48–58 | DOI

[2] M. Zhang, “Statistical features of human exons and their flanking regions”, Human Molecular Genetics, 7(5) (1998), 919–932 | DOI

[3] Y. Saeys, I. Inza, P. Larranaga, “A review of feature selection techniques in bioinformatics”, Bioinformatics, 23(19) (2007), 2507–2517 | DOI

[4] T. F. Cox, “Multidimensional scaling in process control”, Handbook of Statistics, 22 (2003), 609–623 | DOI | MR

[5] A. M. Martinez, A. C. Kak, “PCAversus LDA”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2) (2001), 228–233 | DOI

[6] G. H. John, R. Kohavi, K. Pfleger, “Irrelevant Features and the Subset Selection Problem”, Machine Learning Proceedings. Proceedings of the Eleventh International Conference (New Brunswick, Canada), 1994, 121–129, New Brunswick: Rutgers University | DOI

[7] L. Yu, H. Liu, “Efficient feature selection via analysis of relevance and redundancy”, Journal of Machine Learning Research, 5 (2004), 1205–1224 | MR | Zbl

[8] J. C. Ang, A. Mirzal, H. Haron, HNA. Hamed, “Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection”, IEEE/ACM Transactions on Computational Biology nad Bioinformatics, 13(5) (2016), 971–989 | DOI

[9] L. A. Belanche, F. F. Gonzalez, “Review and evaluation of feature selection algorithms in synthetic problems [Internet]”, 2018, arXiv: http://dx.doi.org/http://arxiv.org/abs/1101.2320

[10] L. Wang, Y. Lei, Y. Zeng, L. Tong, B. Yan, “Principal feature analysis: a multivariate feature selection method for fMRI data”, Computational and Mathematical Methods in Medicine, 2013 (2013), 1–7 | DOI | MR

[11] K. Kira, L. A. Rendell, “A practical approach to feature selection”, Machine Learning Proceedings. Proceedings of the Ninth International Workshop on Machine Learning (Scotland), 1992, 249–256, Aberdeen: ML | DOI

[12] I. Kononenko, “Estimating attributes: Analysis and extensions of RELIEF”, Machine Learning: ECML-94. European Conference (Catania, Italy), 1994, 171–182, Berlin: Springer | DOI

[13] S. R. Singh, H. A. Murthy, T. A. Gonsalves, “Feature selection for text classification based on Gini coefficient of inequality”, The Fourth Workshop on Feature Selection in Data Mining, 10 (2010), 76–85 | MR

[14] V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, “A review of feature selection methods on synthetic data”, Knowledge and Information Systems, 34(3) (2012), 483–519 | DOI

[15] A. Kalousis, J. Prados, M. Hilario, “Stability of Feature Selection Algorithms: a study on high dimensional spaces”, Knowledge and information System, 12(1) (2007), 95–116 | DOI

[16] S. Nogueira, K. Sechidis, G. Brown, “On the stability of feature selection”, Journal of Machine Learning Research, 18(174) (2018), 1–54 | MR

[17] N. J. Nilsson, “Artificial intelligence: A modern approach”, Artificial Intelligence, 82(1–2) (1996), 369–380 | DOI

[18] E. C. Merkle, M. Steyvers, “Choosing a Strictly Proper Scoring Rule”, Decision Analysis, 10(4) (2013), 292–304 | DOI | MR | Zbl

[19] B. L. Aken, S. Ayling, D. Barrell, L. Clarke, V. Curwen, S. Fairley, “The Ensembl gene annotation system”, 2016 | DOI

[20] A. Orriols-Puig, E. Bernado-Mansilla, “Evolutionary rule-based systems for imbalanced data sets”, Soft Computing, 13(3) (2008), 213–225 | DOI

[21] W. Qiu, H. Joe, “Generation of Random Clusters with Specified Degree of Separation”, Journal of Classification, 23(2) (2006), 315–334 | DOI | MR | Zbl