Phase shifts of the triplet periodicity in DNA sequences of genes
Matematičeskaâ biologiâ i bioinformatika, Tome 4 (2009), pp. 66-80.

Voir la notice de l'article provenant de la source Math-Net.Ru

Triplet periodicity is a well known property of the coding DNA sequences. But recently researches have shown that about 20% (122829) sequences from data bank KEGG (release 29) have not triplet periodicity without insertion/deletions on statistically significant level. The goal of this work was to show that the absence of triplet periodicity could be explained by shift of the open reading frame. To finding shifts of the open reading frame we suggest a new mathematical method based on calculation a measure of similarity between triplet periodicity types before and after the position of hypothetical open reading frame shift. Using developed method it was found 4724 sequences with possible open reading frame shifts. We assume that in these cases deletions and insertions were the cause of formation the new open reading frame and destruct the triplet periodicity. Revealed sequences were coded in amino acids sequences using current and ancient open reading frame. Ancient frames were obtained using shift information. 243 amino acids sequences, which were obtained by ancient reading frames, have similarities with proteins from Swiss-prot data bank. It confirms our assumption about possibility of genes evolution by open reading frame shifts.
@article{MBB_2009_4_a0,
     author = {E. V. Korotkov and V. M. Rudenko},
     title = {Phase shifts of the triplet periodicity in {DNA} sequences of genes},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {66--80},
     publisher = {mathdoc},
     volume = {4},
     year = {2009},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MBB_2009_4_a0/}
}
TY  - JOUR
AU  - E. V. Korotkov
AU  - V. M. Rudenko
TI  - Phase shifts of the triplet periodicity in DNA sequences of genes
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2009
SP  - 66
EP  - 80
VL  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2009_4_a0/
LA  - ru
ID  - MBB_2009_4_a0
ER  - 
%0 Journal Article
%A E. V. Korotkov
%A V. M. Rudenko
%T Phase shifts of the triplet periodicity in DNA sequences of genes
%J Matematičeskaâ biologiâ i bioinformatika
%D 2009
%P 66-80
%V 4
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2009_4_a0/
%G ru
%F MBB_2009_4_a0
E. V. Korotkov; V. M. Rudenko. Phase shifts of the triplet periodicity in DNA sequences of genes. Matematičeskaâ biologiâ i bioinformatika, Tome 4 (2009), pp. 66-80. http://geodesic.mathdoc.fr/item/MBB_2009_4_a0/

[1] Fickett J. W., “Predictive methods using nucleotide sequences”, Methods Biochem. Anal., 39 (1998), 231–245 <ext-link ext-link-type='doi' href='https://doi.org/10.1002/9780470110607.ch10'>10.1002/9780470110607.ch10</ext-link>

[2] Staden R., “Staden: statistical and structural analysis of nucleotide sequences”, Methods Mol. Biol., 25 (1994), 69–77

[3] Baxevanis A. D., “Predictive methods using DNA sequences”, Methods Biochem. Anal., 43 (2001), 233–252 <ext-link ext-link-type='doi' href='https://doi.org/10.1002/0471223921.ch10'>10.1002/0471223921.ch10</ext-link>

[4] Gutierrez G., Oliver J. L., Marin A., “On the origin of the periodicity of three in protein coding DNA sequences”, J. Theor. Biol., 167:4 (1994), 413–441 <ext-link ext-link-type='doi' href='https://doi.org/10.1006/jtbi.1994.1080'>10.1006/jtbi.1994.1080</ext-link>

[5] Gao J., Qi Y., Cao Y., Tung W. W., “Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences”, Journal of Biomedicine and Biotechnology, 2005:2 (2005), 139–146 <ext-link ext-link-type='doi' href='https://doi.org/10.1155/JBB.2005.139'>10.1155/JBB.2005.139</ext-link>

[6] Yin C., Yau S. S., “Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence”, Journal of Theoretical Biology, 247 (2007), 687–694 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/j.jtbi.2007.03.038'>10.1016/j.jtbi.2007.03.038</ext-link><ext-link ext-link-type='mr-item-id' href='http://mathscinet.ams.org/mathscinet-getitem?mr=2479617'>2479617</ext-link>

[7] Eskesen S. T., Eskesen F. N., Kinghorn B., Ruvinsky A., “Periodicity of DNA in exons”, BMC Molecular Biology, 5:12 (2004)

[8] Bibb M. J., Findlay P. R., Johnson M. W., “The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences”, Gene, 30 (1984), 157–166 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/0378-1119(84)90116-1'>10.1016/0378-1119(84)90116-1</ext-link>

[9] Konopka A. K., “Sequences and codes: fundamentals of biomolecular cryptology.”, Biocomputing: Informatics and genome projects, eds. Smith D., Academic Press, San Diego, 119–174

[10] Frenkel F. E., Korotkov E. V., “Classification analysis of triplet periodicity in protein-coding regions of genes”, Gene, 421 (2008), 52–60 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/j.gene.2008.06.012'>10.1016/j.gene.2008.06.012</ext-link>

[11] Trifonov E. N., “Elucidating sequence codes: three codes for evolution”, Ann NY Acad. Sci., 870 (1999), 330–338 <ext-link ext-link-type='doi' href='https://doi.org/10.1111/j.1749-6632.1999.tb08894.x'>10.1111/j.1749-6632.1999.tb08894.x</ext-link>

[12] Eigen M., Winkler-Oswatitsch R., “Transfer-RNA: the early adaptor”, Naturwissenschaften, 68 (1981), 217–228 <ext-link ext-link-type='doi' href='https://doi.org/10.1007/BF01047323'>10.1007/BF01047323</ext-link>

[13] Zoltowski M., “Is DNA Code Periodicity Only Due to CUF – Codons Usage Frequency?”, Conf. Proc. IEEE Eng. Med. Biol. Soc., 1 (2007), 1383–1386

[14] Antezana M. A., Kreitman M., “The nonrandom location of synonymous codons suggests that reading frame-independent forces have patterned codon preferences”, J. Mol. Evol., 49:1 (1999), 36–43 <ext-link ext-link-type='doi' href='https://doi.org/10.1007/PL00006532'>10.1007/PL00006532</ext-link><ext-link ext-link-type='mr-item-id' href='http://mathscinet.ams.org/mathscinet-getitem?mr=2504535'>2504535</ext-link>

[15] Issac B., Singh H., Kaur H., Raghava G. P. S., “Locating probable genes using Fourier transform approach”, Bioinformatics, 18:1 (2002), 196–197 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/18.1.196'>10.1093/bioinformatics/18.1.196</ext-link>

[16] Tiwari S., Ramachandran S., Bhattacharya A., Bhattacharya S., Ramaswamy R., “Prediction of probable genes by Fourier analysis of genomic sequences”, Comput. Appl. Biosci., 13:3 (1997), 263–270

[17] Azad R. K., Borodovsky M., “Probabilistic methods of identifying genes in prokaryotic genomes: connections to the HMM theory”, Briefings in bioinformatics, 5:2 (2004), 118–130 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bib/5.2.118'>10.1093/bib/5.2.118</ext-link>

[18] Henderson J., Salzberg S., Fasman K. H., “Finding genes in DNA with a Hidden Markov Model”, J. Comput. Biol., 4 (1997), 127–141 <ext-link ext-link-type='doi' href='https://doi.org/10.1089/cmb.1997.4.127'>10.1089/cmb.1997.4.127</ext-link>

[19] Snyder E. E., Stormo G. D., “Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks”, Nucl. Acids Res., 21 (1993), 607–613 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/21.3.607'>10.1093/nar/21.3.607</ext-link>

[20] Thomas A., Skolnick M. H., “A probabilistic model for detecting coding regions in DNA sequences”, IMA J. Math. Appl. Med. Biol., 11:3 (1994), 149–160 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/imammb/11.3.149'>10.1093/imammb/11.3.149</ext-link><ext-link ext-link-type='zbl-item-id' href='https://zbmath.org/?q=an:0809.92008'>0809.92008</ext-link>

[21] Korotkov E. V., Korotkova M. A., Frenkel F. E., Kudryashov N. A., “Information approach for search of periodicity of symbolical sequences”, Molek. Biol., 37 (2003), 372–386 <ext-link ext-link-type='doi' href='https://doi.org/10.1023/A:1024231109360'>10.1023/A:1024231109360</ext-link>

[22] Korotkov E. V., Korotkova M. A., Kudryashov N. A., “Information decomposition method for analysis of symbolical sequences”, Physical Letters A, 312 (2003), 198–210 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/S0375-9601(03)00641-8'>10.1016/S0375-9601(03)00641-8</ext-link><ext-link ext-link-type='mr-item-id' href='http://mathscinet.ams.org/mathscinet-getitem?mr=2046272'>2046272</ext-link><ext-link ext-link-type='zbl-item-id' href='https://zbmath.org/?q=an:1041.68073'>1041.68073</ext-link>

[23] Ogata H., Goto S., Sato K., Fujibuchi W., Bono H., Kanehisa M., “KEGG: Kyoto Encyclopedia of Genes and Genomes”, Nucl. Acids Res., 27 (1999), 29–34 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/27.1.29'>10.1093/nar/27.1.29</ext-link>

[24] Frenkel F. E., Korotkov E. V., “Using triplet periodicity of nucleotide sequences for finding potential reading frame shifts in genes”, DNA Res., 16 (2009), 105–114 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/dnares/dsp002'>10.1093/dnares/dsp002</ext-link>

[25] Okamura K., Feuk L., Marquès-Bonet T., Navarro A., Scherer S. W., “Frequent appearance of novel protein-coding sequences by frameshift translation”, Genomics, 88 (2006), 690–697 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/j.ygeno.2006.06.009'>10.1016/j.ygeno.2006.06.009</ext-link>

[26] Raes J., Van de Peer Y., “Functional divergence of proteins through frameshift mutations”, Trends Genet., 21 (2005), 428–431 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/j.tig.2005.05.013'>10.1016/j.tig.2005.05.013</ext-link>

[27] Kramer E. M., Huei-Jiun Su, Cheng-Chiang Wu, Jer-Ming Hu., “A simplified explanation for the frameshift mutation that created a novel C-terminal motif in the APETALA3 gene lineage”, BMC Evolutionary Biology, 6:30 (2006)

[28] Kullback S., Information Theory and Statistics, Wiley, New York, 1959 <ext-link ext-link-type='mr-item-id' href='http://mathscinet.ams.org/mathscinet-getitem?mr=103557'>103557</ext-link><ext-link ext-link-type='zbl-item-id' href='https://zbmath.org/?q=an:0088.10406'>0088.10406</ext-link>

[29] Mir, Moscow, 1967 <ext-link ext-link-type='zbl-item-id' href='https://zbmath.org/?q=an:0168.17103'>0168.17103</ext-link>

[30] “UniProt Consortium. The Universal Protein Resource (UniProt)”, Nucl. Acids Res., 35 (2007), 193–197

[31] Needleman S. B., Wunsch C. D., “A general method applicable to the search for similarities in the amino acid sequence of two proteins”, J. Mol. Biol., 48:3 (1970), 443–453 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/0022-2836(70)90057-4'>10.1016/0022-2836(70)90057-4</ext-link>

[32] Altschul S. F. et al., “Basic local alignment search tool”, J. Mol. Biol., 215:3 (1990), 403–410

[33] Bollenbach T., Vetsigian K., Kishony R., “Evolution and multilevel optimization of the genetic code”, Genome Res., 17:4 (2007), 405–412 <ext-link ext-link-type='doi' href='https://doi.org/10.1101/gr.6144007'>10.1101/gr.6144007</ext-link>