Motif based sequence classification

E. P. Ofitserov

E. P. Ofitserov

Čebyševskij sbornik, Tome 19 (2018) no. 1, pp. 187-199

Voir la notice de l'article provenant de la source Math-Net.Ru

Résumé

Sequence classification problems often arise in such areas as bioinformatics and natural language processing. In the last few year best results in this field were achieved by the deep learning methods, especially by architectures based on recurrent neural networks (RNN). However, the common problem of such models is a lack of interpretability, i.e., extraction of key features from data that affect the most the model's decision. Meanwhile, using of less complicated neural network leads to decreasing predictive performance thus limiting usage of state-of-art machine learning methods in many subject areas. In this work we propose a novel interpretable deep learning architecture based on extraction of principal sets of short substrings — sequence motifs. The presence of extracted motif in the input sequence is a marker for a certain class. The key component of proposed solution is differential alignment algorithm developed by us, which provides a smooth analog of classical string comparison methods such as Levenshtein edit distance, and Smith–Waterman local alignment. Unlike previous works devoted to the motif based classification, which used CNN for shift-invariant searching, ours model provide a way to shift and gap invariant extraction of motifs.

Keywords: sequence classification, machine learning, neural network, motif extraction.

@article{CHEB_2018_19_1_a13,
     author = {E. P. Ofitserov},
     title = {Motif based sequence classification},
     journal = {\v{C}eby\v{s}evskij sbornik},
     pages = {187--199},
     publisher = {mathdoc},
     volume = {19},
     number = {1},
     year = {2018},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/CHEB_2018_19_1_a13/}
}

TY  - JOUR
AU  - E. P. Ofitserov
TI  - Motif based sequence classification
JO  - Čebyševskij sbornik
PY  - 2018
SP  - 187
EP  - 199
VL  - 19
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CHEB_2018_19_1_a13/
LA  - ru
ID  - CHEB_2018_19_1_a13
ER  -

%0 Journal Article
%A E. P. Ofitserov
%T Motif based sequence classification
%J Čebyševskij sbornik
%D 2018
%P 187-199
%V 19
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CHEB_2018_19_1_a13/
%G ru
%F CHEB_2018_19_1_a13

E. P. Ofitserov. Motif based sequence classification. Čebyševskij sbornik, Tome 19 (2018) no. 1, pp. 187-199. http://geodesic.mathdoc.fr/item/CHEB_2018_19_1_a13/

Parcourir par

Geodesic

Parcourir par