Two algorithms based on Markov chains and their application to recognition of protein coding genes in prokaryotic genomes
Applicationes Mathematicae, Tome 40 (2013) no. 4, pp. 447-457.

Voir la notice de l'article provenant de la source Institute of Mathematics Polish Academy of Sciences

Methods based on the theory of Markov chains are most commonly used in the recognition of protein coding sequences. However, they require big learning sets to fill up all elements in transition probability matrices describing dependence between nucleotides in the analyzed sequences. Moreover, gene prediction is strongly influenced by the nucleotide bias measured by e.g. G+C content. In this paper we compare two methods: (i) the classical GeneMark algorithm, which uses a three-periodic non-homogeneous Markov chain, and (ii) an algorithm called PMC that considers six independent homogeneous Markov chains to describe transition between nucleotides separately for each of three codon positions in two DNA strands. We have tested the efficiency (in terms of true positive rate) of these two Markov chain methods for the model bacterial genome of Escherichia coli depending on the size of the learning set, uncertainty of ORFs' function annotation, and model order of these algorithms. We have also applied the methods with different model orders for $163$ prokaryotic genomes that covered a wide range of G+C content. The PMC algorithm of different chain orders turns out to be more stable in comparison to the GeneMark algorithm. The PMC also outperforms the GM algorithm giving a higher fraction of coding sequences in the tested set of annotated genes. Moreover, it requires much smaller learning sets than GM to work properly.
DOI : 10.4064/am40-4-5
Keywords: methods based theory markov chains commonly recognition protein coding sequences however require learning sets fill elements transition probability matrices describing dependence between nucleotides analyzed sequences moreover gene prediction strongly influenced nucleotide bias measured content paper compare methods classical genemark algorithm which uses three periodic non homogeneous markov chain algorithm called pmc considers six independent homogeneous markov chains describe transition between nucleotides separately each three codon positions dna strands have tested efficiency terms positive rate these markov chain methods model bacterial genome escherichia coli depending size learning set uncertainty orfs function annotation model order these algorithms have applied methods different model orders prokaryotic genomes covered wide range content pmc algorithm different chain orders turns out stable comparison genemark algorithm pmc outperforms algorithm giving higher fraction coding sequences tested set annotated genes moreover requires much smaller learning sets work properly

Małgorzata Grabińska 1 ; Paweł Błażej 1 ; Paweł Mackiewicz 1

1 Department of Genomics Faculty of Biotechnology University of Wrocław Przybyszewskiego 63/77 51-148 Wrocław, Poland
@article{10_4064_am40_4_5,
     author = {Ma{\l}gorzata Grabi\'nska and Pawe{\l} B{\l}a\.zej and Pawe{\l} Mackiewicz},
     title = {Two algorithms based on {Markov} chains and their application to recognition of protein coding genes in prokaryotic genomes},
     journal = {Applicationes Mathematicae},
     pages = {447--457},
     publisher = {mathdoc},
     volume = {40},
     number = {4},
     year = {2013},
     doi = {10.4064/am40-4-5},
     zbl = {1326.92053},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.4064/am40-4-5/}
}
TY  - JOUR
AU  - Małgorzata Grabińska
AU  - Paweł Błażej
AU  - Paweł Mackiewicz
TI  - Two algorithms based on Markov chains and their application to recognition of protein coding genes in prokaryotic genomes
JO  - Applicationes Mathematicae
PY  - 2013
SP  - 447
EP  - 457
VL  - 40
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/articles/10.4064/am40-4-5/
DO  - 10.4064/am40-4-5
LA  - en
ID  - 10_4064_am40_4_5
ER  - 
%0 Journal Article
%A Małgorzata Grabińska
%A Paweł Błażej
%A Paweł Mackiewicz
%T Two algorithms based on Markov chains and their application to recognition of protein coding genes in prokaryotic genomes
%J Applicationes Mathematicae
%D 2013
%P 447-457
%V 40
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/articles/10.4064/am40-4-5/
%R 10.4064/am40-4-5
%G en
%F 10_4064_am40_4_5
Małgorzata Grabińska; Paweł Błażej; Paweł Mackiewicz. Two algorithms based on Markov chains and their application to recognition of protein coding genes in prokaryotic genomes. Applicationes Mathematicae, Tome 40 (2013) no. 4, pp. 447-457. doi : 10.4064/am40-4-5. http://geodesic.mathdoc.fr/articles/10.4064/am40-4-5/

Cité par Sources :