The complexity of DNA sequences. Different approaches and definitions

V. D. Gusev; L. A. Miroshnichenko

V. D. Gusev ; L. A. Miroshnichenko

Matematičeskaâ biologiâ i bioinformatika, Tome 15 (2020) no. 2, pp. 313-337

Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

Résumé

An important quantitative characteristic of symbolic sequence (texts, strings) is complexity, which reflects at the intuitive level the degree of their “non-randomness”. A.N. Kolmogorov formulated the most general definition of complexity. He proposed measuring the complexity of an object (symbolic sequence) by the length of the shortest descriptions by which this object can be uniquely reconstructed. Since there is no program guaranteed to search for the shortest description, in practice, various algorithmic approximations considered in this paper are used for this purpose. Along with definitions of complexity, suggesting the possibility of reconstruction a sequence from its "description", a number of measures are considered that do not imply such restoration. They are based on the calculation of some quantitative characteristics. Of interest is not only a quantitative assessment of complexity, but also the identification and classification of structural regularities that determine its specific value. In one form or another, they are expressed in the demonstration of repetition in the broadest sense. The considered measures of complexity are conventionally divided into statistical ones that take into account the frequency of occurrence of symbols or short “words” in the text, “dictionary” ones that estimate the number of different “subwords” and “structural” ones based on the identification of long repeating fragments of text and the determination of relationships between them. Most of the methods are designed for sequences of an arbitrary linguistic nature. The special attention paid to DNA sequences, reflected in the title of the article, is due to the importance of the object, manifestations of repetition of different types, and numerous examples of using the concept of complexity in solving problems of classification and evolution of various biological objects. Local structural features found in the sliding window mode in DNA sequences are of considerable interest, since zones of low complexity in the genomes of various organisms are often associated with the regulation of basic genetic processes.

Export
Comment citer

@article{MBB_2020_15_2_a20,
     author = {V. D. Gusev and L. A. Miroshnichenko},
     title = {The complexity of {DNA} sequences. {Different} approaches and definitions},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {313--337},
     year = {2020},
     volume = {15},
     number = {2},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MBB_2020_15_2_a20/}
}

TY  - JOUR
AU  - V. D. Gusev
AU  - L. A. Miroshnichenko
TI  - The complexity of DNA sequences. Different approaches and definitions
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2020
SP  - 313
EP  - 337
VL  - 15
IS  - 2
UR  - http://geodesic.mathdoc.fr/item/MBB_2020_15_2_a20/
LA  - ru
ID  - MBB_2020_15_2_a20
ER  -

%0 Journal Article
%A V. D. Gusev
%A L. A. Miroshnichenko
%T The complexity of DNA sequences. Different approaches and definitions
%J Matematičeskaâ biologiâ i bioinformatika
%D 2020
%P 313-337
%V 15
%N 2
%U http://geodesic.mathdoc.fr/item/MBB_2020_15_2_a20/
%G ru
%F MBB_2020_15_2_a20

V. D. Gusev; L. A. Miroshnichenko. The complexity of DNA sequences. Different approaches and definitions. Matematičeskaâ biologiâ i bioinformatika, Tome 15 (2020) no. 2, pp. 313-337. http://geodesic.mathdoc.fr/item/MBB_2020_15_2_a20/

Bibliographie
Cité par

[1] D. E. Knuth, The Art of Computer Programming, v. 2, Seminumerical Algorithms, Addison-Wesley Publishing Company, 1969 | MR | Zbl

[2] W. H. Jermann, “Redundancy in deterministic sequences”, IEEE Trans. on Syst. Sci. and Cybernetics, 6:4 (1970) | DOI | Zbl

[3] C. Shannon, “A mathematical theory of communication”, Bell System Techn. J., 27:3 (1948), 379–423 | DOI | MR | Zbl

[4] C. Shannon, “A mathematical theory of communication”, Bell System Techn. J., 27:4 (1948), 623–656 | DOI | MR | Zbl

[5] Zh. I. Reznikova, B. Ya. Ryabko, “Analiz yazyka muravev metodami teorii informatsii”, Problemy peredachi informatsii, XXII:3 (1986), 103–108 | Zbl

[6] A. N. Kolmogorov, “Tri podkhoda k opredeleniyu ponyatiya «kolichestvo informatsii»”, Problemy peredachi informatsii, 1:1 (1965), 3–11 | MR | Zbl

[7] R. Solomonoff, A Preliminary Report on a General Theory of Inductive Inference, Zator Co, Cambridge, Ma., 1960 | MR

[8] R. A. Solomonoff, “Formal theory of inductive inference. Part I”, Information and Control, 7:1 (1964), 1–22 | DOI | MR | Zbl

[9] G. Chaitin, “Information-theoretic limitations of formal systems”, Journal of the ACM, 21:3 (1974), 403–424 | DOI | MR | Zbl

[10] L. A. Levin, “O razlichnykh merakh slozhnosti konechnykh ob'ektov (aksiomaticheskoe opisanie)”, Doklady AN SSSR, 227:4 (1976), 804–807 | MR | Zbl

[11] P. Salamon, A. K. Konopka, “A maximum entropy principle for the distribution of local complexity in naturally occurring nucleotide sequences”, Computers Chem., 16:2 (1992), 117–124 | DOI | Zbl

[12] R. Román-Roldán, P. Bernaola-Galván, J. L. Oliver, “Sequence compositional complexity of DNA through an entropic segmentation method”, Physical Review Letters, 80 (1998), 1344–1347 | DOI

[13] E. N. Trifonov, “Making sense of the human genome”, Structure Methods, v. 1, eds. R. H. Sarma, M. H. Sarma, Adenine Press, 1990, 69–77

[14] M. Crochemore, R. Verin, “Zones of low entropy in genomic sequences”, Computers and Chemistry, 23 (1999), 275–282 | DOI

[15] A. E. Gabrielian, A. Bolshoy, “Sequence complexity and DNA curvature”, Comput. Chem., 23 (1999), 263–274 | DOI

[16] O. G. Troyanskaya, O. Arbell, Y. Koren, G. M. Landau, A. Bolshoy, “Sequence complexity profiles of prokaryotic genomic sequences: a fast algorithm for calculating linguistic complexity”, Bioinformatics, 18:5 (2002), 679–688 | DOI

[17] S. Grumbach, F. Tahi, “Compression of DNA sequences”, Proc. IEEE Symp. on Data Compression, 1993, 340–350 | DOI

[18] S. Grumbach, F. Tahi, “A new challenge for compression algorithms: genetic sequences”, J. Information Processing and Management, 30:6 (1994), 875–866 | DOI

[19] D. Pratas, M. Hosseini, J. M. Silva, A. J. Pinho, “A reference-free lossless compression algorithm for DNA sequences using a competitive prediction of two classes of weighted models”, Entropy, 21:11 (2019), 1074 | DOI | MR

[20] M. C. Brandon, D. C. Wallace, P. Baldi, “Data structures and compression algorithms for genomic sequence data”, Bioinformatics, 25:14 (2009), 1731–1738 | DOI

[21] S. Deorowicz, S. Grabowski, “Robust relative compression of genomes with random access”, Bioinformatics, 27:21 (2011), 2979–2986 | DOI

[22] D. S. Pavlichin, T. Weissman, G. Yona, “The human genome contracts again”, Bioinformatics, 29:17 (2013), 2199–2202 | DOI

[23] N. S. Bakr, A. A. Sharawi, “DNA lossless compression algorithms: Review”, American Journal of Bioinformatics Research, 3:3 (2013), 72–81 | DOI

[24] Z. Zhu, Y. Zhang, Z. Ji, S. He, X. Yang, “High-throughput DNA sequence data compression”, Briefings in Bioinformatics, 16:1 (2015), 1–15 | DOI

[25] M. Hosseini, D. Pratas, A. Pinho, “A survey on data compression methods for biological sequences”, Information, 7:4 (2016), 56 | DOI

[26] Yu. G. Smetanin, M. V. Ulyanov, A. S. Pestova, “Entropiinyi podkhod k postroeniyu mery simvolnogo raznoobraziya slov i ego primenenie k klasterizatsii genomov rastenii”, Matematicheskaya biologiya i bioinformatika, 11:1 (2016), 114–126 | DOI

[27] C. Shannon, “Prediction and entropy of printed English”, Bell System Techn. J., 30:1 (1951), 50–64 | DOI | Zbl

[28] H. Herzel, “Complexity of symbol sequences”, Systems Analysis Modelling Simulation, 5:5 (1988), 435–444 | MR | Zbl

[29] W. Ebeling, G. Nicolis, “Word frequency and entropy of symbolic sequences: a dynamical perspecrive”, Chaos, Solitons and Fractals, 2:6 (1992), 635–650 | DOI | MR | Zbl

[30] A. O. Schmitt, H. Herzel, “Estimating the entropy of DNA sequences”, J. Theor. Biol., 188 (1997), 369–377 | DOI

[31] O. Weiss, M. A. Jimńes-Montaño, H. Herzel, “Information content of protein sequences”, J. Theor. Biol., 206 (2000), 379–386 | DOI

[32] M. Farach, M. Noordewier, S. Savari, L. Shepp, A. Syner, J. Ziv, “On the entropy of DNA: algorithms and measurements based on memory and rapid convergence”, Proceedings of the 6th ACM-SIAM Symposium on Discrete Algorithms, ACM, Inc., New-York, 1995, 48–57 | Zbl

[33] D. Loewenstern, P. N. Yianilos, “Significantly lower entropy estimates for natural DNA sequences”, J. Comput. Biol., 6 (1999), 125–142 | DOI

[34] O. S. Kisliuk, T. A. Borovina, N. N. Nazipova, “Estimation of redundancy of genetic texts by the high frequency component of the $L$-gram graph”, Biophysics, 44:4 (1999), 621–630

[35] R. Fano, Peredacha informatsii. Statisticheskaya teoriya svyazi, Mir, M., 1965, 438 pp.

[36] D. Huffman, “A method for the construction of minimum-redundancy codes”, Proceedings of the IRE, 40:9 (1952), 1098–1101 | DOI | Zbl

[37] D. E. Knuth, “Dynamic Huffman Coding”, Journal of Algorithms, 6:2 (1985), 163–180 | DOI | MR | Zbl

[38] B. Ya. Ryabko, “Bystryi algoritm adaptivnogo kodirovaniya”, Probl. peredachi inform., 26:4 (1990), 24–37 | MR | Zbl

[39] E. N. Gilbert, E. F. Moore, “Variable-length binary encodings”, Bell System Technical Journal, 38:4 (1959), 933–967 | DOI | MR

[40] B. Ya. Ryabko, “Szhatie dannykh s pomoschyu stopki knig”, Probl. peredachi inform., 16:4 (1980), 16–21 | MR

[41] G. Nigel, N. Martin, “Range encoding: An algorithm for removing redundancy from a digitized message”, Video Data Recording Conference, Southampton, UK, 1979

[42] A. Said, “Introduction to Arithmetic Coding Theory and Practice”, Lossless Compression Handbook, ed. Sayood K., Elsevier Inc., 2003, 101–152 | DOI

[43] A. Barron, J. Rissanen, B. Yu, “The minimum description length principle in coding and modeling”, IEEE Transactions on Information Theory, 44:6 (1998) | DOI | MR

[44] Y. L. Orlov, V. P. Filippov, V. N. Potapov, N. A. Kolchanov, “Construction of stochastic context trees for genetic texts”, In Silico Biology, 2:3 (2002), 233–247

[45] A. K. Konopka, “Sequences and codes: fundamentals of biomolecular cryptology”, Biocomputing: Informatics and Genome Projects, ed. Smith D.W., Academic Press, New York, 1994, 119–174 | DOI

[46] H. Wan, J. C. Wootton, “A global compositional complexity measure for biological sequences: AT-rich and CG-rich genomes encode less complex proteins”, Computers and Chem., 24:1 (2000), 71–94 | DOI | Zbl

[47] R. V. L. Hartley, “Transmission of Information”, Bell Syst. Techn. J., 7:3 (1928), 535–563 | DOI

[48] J. C. Wootton, S. Federhen, “Statistics of local complexity in amino acid sequences and sequence databases”, Computers Chemistry, 17:2 (1993), 149–163 | DOI | Zbl

[49] J. C. Wootton, S. Federhen, “Analysis of compositionally biased regions in sequence databases”, Methods in Enzymology, 266 (1996), 554–571 | DOI

[50] P. Bernaola-Galvan, R. Román-Roldán, J. L. Oliver, “Compositional segmentation and long-range fractal correlation in DNA sequences”, Phys. Rev. E, 53:5 (1996), 5181–5189 | DOI

[51] W. Li, “The complexity of DNA: the measure of compositional heterogeneity in DNA sequences and measures of complexity”, Complexity, 3:2 (1997), 33–37 | DOI | MR

[52] J. L. Oliver, R. Román-Roldán, J. Pérez, P. Bernaola-Galván, “SEGMENT: identifying compositional domains in DNA sequences”, Bioinformatics, 15:2 (1999), 974–979 | DOI

[53] J. Lin, “Divergence measure based on the Shannon entropy”, IEEE Transactions on Information Theory, 37 (1991), 145–151 | DOI | MR | Zbl

[54] D. Tautz, M. Trick, G. A. Dover, “Cryptic simplicity in DNA is major source of genetic variation”, Nature, 322 (1986), 652–656 | DOI

[55] J. M. Hancock, J. S. Armstrong, “SIMPLE34: an improved and enhanced implementation for VAX and Sun computers of the SIMPLE algorithm for analysis of clustered repetitive motifs in nucleotide sequences”, Comput. Appl. Biosci., 10 (1994), 67–70

[56] Alba M. Mar, R. A. Laskowski, J. M. Hancock, “Detecting cryptically simple protein sequences using the SIMPLE algorithm”, Bioinformatics, 5 (2002), 672–678 | DOI

[57] V. J. Promponas, A. J. Enright, S. Tsoka, D. P. Kreil, C. Leroy, S. Hamodrakas, C. Sander, C. A. Ouzounis, “CAST: an iterative algorithm for the complexity analysis of sequence tracts”, Bioinformatics, 16:10 (2000), 915–922 | DOI

[58] G. Benson, “Tandem repeats finder: a program to analyze DNA sequences”, NAR, 22:2 (1999), 573–580 | DOI

[59] M. B. Chalei, V. A. Kutyrkin, G. E. Tyulbasheva, E. I. Teplukhina, N. N. Nazipova, “Issledovanie fenomena skrytoi periodichnosti v genomakh eukarioticheskikh organizmov”, Matematicheskaya biologiya i bioinformatika, 8:2 (2013), 480–501 | DOI

[60] M. Lothaire, Combinatorics on Words, Addison-Wesley, Reading, MA, 1983 | MR | Zbl

[61] S. Ferenczi, “Complexity of sequences and dynamical systems”, Discrete Mathematics, 206:1-3 (1999), 145–154 | DOI | MR | Zbl

[62] A. Bolshoy, “DNA sequence analysis linguistic tools: contrast vocabularies, compositional spectra and linguistic complexity”, Applied Bioinformatics, 2:2 (2003), 103–112

[63] A. Bolshoy, K. Shapiro, E. N. Trifonov, I. Ioshikhes, “Enhancement of the nucleosomal pattern in sequences of lower complexity”, Nucl. Acids Res., 25 (1997), 3248–3254 | DOI

[64] E. Ukkonen, “On-line constructing of suffix trees”, Algorithmica, 14 (1995), 249–260 | DOI | MR | Zbl

[65] A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, R. McConnel, “Building the minimal DFA for the set of all subwords of a word on-line in linear time”, Lect. Notes in Comput. Sci., 172, 1984, 109–118 | DOI | MR | Zbl

[66] A. Lempel, J. Ziv, “On the complexity of finite sequences”, IEEE Trans. Inform. Theory, IT-22:1 (1976), 75–81 | DOI | MR | Zbl

[67] J. Ziv, A. Lempel, “A universal algorithm for sequential data compression”, IEEE Trans. Inform. Theory, IT-23:3 (1977), 337–343 | DOI | MR | Zbl

[68] J. Ziv, A. Lempel, “Compression of individual sequences via variable-rate coding”, IEEE Trans. Inform. Theory, IT-24:5 (1978), 530–536 | DOI | MR | Zbl

[69] X. Chen, M. Li, B. Ma, J. Tromp, “DNACompress: fast and effective DNA sequence compression”, Bioinformatics, 18:12 (2002), 1696–1698 | DOI

[70] K. N. Mishra, A. Aaggarwal, E. Abdelhadi, D. Srivastava, “An efficient horizontal and vertical method for online DNA sequence compression”, Int. J. Comput. Appl., 3:1 (2010), 39–46 | DOI

[71] V. D. Gusev, L. A. Miroshnichenko, N. A. Chuzhanova, “Vyyavlenie fraktalopodobnykh struktur v DNK-posledovatelnostyakh”, Classification, Forecasting, Data Mining, Information Science Computing, 8, ITHEA, Sofia, 2009, 117–123

[72] V. D. Gusev, V. A. Kulichkov, O. M. Chupakhina, Slozhnostnoi analiz geneticheskikh tekstov (na primere faga), preprint No 20, IM SOAN SSSR, Novosibirsk, 1989, 49 pp.

[73] V. D. Gusev, V. A. Kulichkov, O. M. Chupakhina, “Slozhnostnoi analiz genomov. I. Mery slozhnosti i klassifikatsiya vyyavlyaemykh zakonomernostei”, Molekulyarnaya biologiya, 25:3 (1991), 825–833

[74] V. D. Gusev, V. A. Kulichkov, O. M. Chupakhina, “Slozhnostnoi analiz genomov. II. Zony obshirnoi gomologii v bakteriofage”, Molekulyarnaya biologiya, 25:4 (1991), 1080–1089

[75] V. D. Gusev, V. A. Kulichkov, O. M. Chupakhina, “The Lempel-Ziv complexity and local structure analysis of genomes”, Biosystems, 30:1-3 (1993), 183–200 | DOI

[76] V. D. Gusev, L. A. Nemytikova, N. A. Chuzhanova, “On the complexity measures of genetic sequences”, Bioinformatics, 15:12 (1999), 994–999 | DOI | MR

[77] V. D. Gusev, L. A. Miroshnichenko, “Ispolzovanie slozhnostnykh razlozhenii v zadachakh analiza simvolnykh posledovatelnostei”, Doklady 8-i Mezhdunarodnoi konferentsii “Intellektualizatsiya obrabotki informatsii”, IOI-2010 (Kipr, Pafos, 17–24 oktyabrya 2010), 2010, 469–472

[78] V. D. Gusev, L. A. Miroshnichenko, “Poisk kombinirovannykh struktur v DNK-posledovatelnostyakh”, Doklady vserossiiskoi konferentsii MMRO-13 «Matematicheskie metody raspoznavaniya obrazov» (Leningradskaya obl., g. Zelenogorsk, 30 sentyabrya–6 oktyabrya 2007 g.), Maks-Press, M., 2007, 473–476

[79] V. D. Gusev, “Slozhnostnye profili simvolnykh posledovatelnostei”, Metody obrabotki simvolnykh posledovatelnostei i signalov, Vychislitelnye sistemy, 132, Novosibirsk, 1989, 35–63 | Zbl

[80] Yu. L. Orlov, V. D. Gusev, L. A. Miroshnichenko, “LZcomposer: decomposition of genomic sequences by repeat fragments”, Biophisics, 48, Suppl. 1 (2003), S7–S16

[81] V. D. Gusev, L. A. Nemytikova, N. A. Chuzhanova, “Bystryi metod vyyavleniya vzaimosvyazei v podborkakh funktsionalno i/ili evolyutsionno blizkikh biologicheskikh tekstov”, Molekulyarnaya biologiya, 35:6 (2001), 1015–1022

[82] N. A. Chuzhanova, M. Krawczak, L. A. Nemytikova, V. D. Gusev, D. N. Cooper, “Promoter shuffling has occurred during the evolution of the vertebrate growth hormone gene”, Gene, 254 (2000), 9–18 | DOI

[83] A. Surguchov, “Migration of promoter elements between genes: a role in transcriptional regulation and evolution”, Biomed. Sci., 2 (1991), 22–28

[84] N. A. Chuzhanova, M. Krawczak, N. Thomas, L. A. Nemytikova, V. D. Gusev, D. N. Cooper, “The evolution of the vertebrate beta-globin gene promoter”, Evolution, 56:2 (2002), 224–232 | DOI

[85] Yu. L. Orlov, V. N. Potapov, “Estimation of stochastic complexity of genetical texts”, Computational technologies (Novosibirsk), 5, Special issue (2000), 5–15

[86] I. I. Kiknadze, L. I. Gunderina, A. G. Istomina, V. D. Gusev, L. A. Nemytikova, “Similarity analysis of inversion banding sequences in chromosomes of Chironomus species (breakpoint phylogeny)”, Bioinformatics of Genome Regulation and Structure, eds. N. Kolchanov, R. Hofestaedt, Springer, Boston, MA, 2004, 245–254 | DOI

[87] A. N. Grigoreva, “Mery slozhnosti slov na osnove predikata vkhozhdeniya i redaktsionnogo rasstoyaniya”, Zap. nauchn. seminarov LOMI AN SSSR, 105, 1981, 18–24

[88] L. Allison, T. Edgoose, T. I. Dix, “Compression of strings with approximate repeats”, Intelligent Systems in Molecular Biology, ISMB'98 (Montreal, 28 June–1 July, 1998), 1998, 8–16

[89] X. Chen, S. Kwong, M. Li, “A compression algorithm for DNA sequences and its applications in genome comparison”, International Conference on Genome Informatics, Genome informatics, 10, 1999, 51–61

[90] B. Ma, J. Tromp, M. Li, “PatternHunter: Faster and more sensitive homology search”, Bioinformatics, 18:3 (2002), 440–445 | DOI

[91] Yu. V. Merekin, “Nizhnyaya otsenka slozhnosti dlya skhem konkatenatsii slov”, Diskretn. analiz i issled. oper., 3:1 (1996), 52–56 | MR | Zbl

[92] A. A. Evdokimov, “Analiz, slozhnost i rekonstruktsiya simvolnykh posledovatelnostei”, Vestnik TGU, 2005, no. 14, 4–12

[93] W. Ebeling, M. A. Jiménes-Montaño, “On grammars, complexity, and information measures of biological macromolecules”, Math. Biosci., 52 (1980), 53–71 | DOI | Zbl

[94] M. A. Jiménes-Montaño, “On syntactic structure of protein sequences and the concept of grammar complexity”, Bull. Math. Biol., 46 (1984), 641–659 | DOI | MR

[95] M. A. Jiménes-Montaño , T. Pöschel, P. E. Rapp, “A measure of the information content of neural spike trains”, Proc. Symp. on Complexity in Biology, eds. E. Mizraji, L. Acerenza, F. Alvares, A. Pomi, D. I. R. A. C., Montevideo, Uruguay, 1997, 113–142 | MR

[96] M. Charikar, E. Lehman, D. Liu, R. Panigrahy, M. Prabhakaran, A. Sahai, A. Shelat, “The smallest grammar problem”, IEEE Transactions on Information Theory, 51:7 (2005), 2554–2576 | DOI | MR | Zbl

[97] C. G. Nevill-Manning, I. H. Witten, “Identifying hierarchical structure in sequences: a linear-time algorithm”, Journal of Artificial Intelligence Research, 7 (1997), 67–82 | DOI | Zbl

[98] I. H. Witten, “Adaptive text mining: inferring structure from sequences”, Journal of discrete algorithms, 2:2 (2004), 137–159 | DOI | MR | Zbl

[99] R. Carrascosa, F. Coste, M. Gallé, G. Infante-Lopes, “Searching for smallest grammars on large sequences and application to DNA”, Journal of Discrete Algorithms, 11 (2012), 62–72 | DOI | MR | Zbl

[100] C. G. Nevill-Manning, I. H. Witten, “Online and offline heuristics for inferring hierarchies of repetitions in sequences”, Proc IEEE, 88:11 (2000), 1745–1755 | DOI

[101] N. Cherniavsky, R. Ladner, “Grammar-based compression of DNA sequences”, Proceedings of the DIMACS Working Group on the Burrows-Wheeler Transform (New Jersey, 2004)

[102] Q. Liu, Yu. Yang, C. Chen, J. Bu, Y. Zhang, X. Ye, “RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure”, BMC Bioinformatics, 9 (2008), 176 | DOI

[103] E. N. Trifonov, “Geneticheskoe soderzhanie posledovatelnosti DNK opredelyaetsya superpozitsiei mnogikh kodov”, Molekulyarnaya biologiya, 31:4 (1997), 759–767

[104] C. H. Bennett, P. Glacs, M. Li, P. Vitányi, W. H. Zurek, “Information Distance”, IEEE Trans. on Inf. Th., 44:4 (1998), 1407–1423 | DOI | MR | Zbl

[105] M. Li, X. Chen, X. Li, B. Ma, P. M.B. Vitanyi, “The similarity metric”, IEEE Trans. on Inf. Th., 50:12 (2004), 3250–3264 | DOI | MR | Zbl

[106] J.-S. Varré, J. P. Delahaye, E. Rivals, “Transformation distances: a family of dissimilarity measures based on movements of segments”, Bioinformatics, 15:3 (1999), 194–202 | DOI

[107] S. Vinga, J. S. Almeida, “Alignment-free sequence comparison a Review”, Bioinformatics, 19:4 (2003), 513–523 | DOI

[108] C. S. Wallace, D. M. Boulton, “An information measure for classification”, Computer J., 11:2 (1968), 185–194 | DOI | Zbl

[109] D. Sankoff, G. Leduc, N. Antoine, B. Paquin, B. F. Lang, R. Cedergren, “Gene order comparison for phylogenetic inference: Evolution of the mitochondrial genome”, PNAS USA, 89 (1992), 6575–6579 | DOI

[110] D. Sankoff, J. H. Nadeau, “Conserved synteny as a measure of genomic distance”, Discrete Appl. Math., 71 (1996), 247–257 | DOI | MR | Zbl

[111] V. Bafna, P. A. Pevzner, “Sorting by reversals: genome rearrangements in plant organelles and evolutionary history of X chromosome”, Molecular Biology and Evolution, 12:2 (1995), 239–246 | DOI | MR

[112] M. Li, J. H. Badger, X. Chen, S. Kwong, P. Kearney, H. Zhang, “An information-based sequence distance and its application to whole mitochondrial genome phylogeny”, Bioinformatics, 17:2 (2001), 149–154 | DOI

[113] A. Salomaa, Jewels of formal language theory, Computer Science Press, Rockville, 1981 | MR | Zbl

[114] Iványi, “On the d-complexity of words”, Ann. Univ. Sci Budapest Sect Comput., 8 (1987), 69–90 | MR | Zbl

[115] I. Nakashima, J. Tamura, S. Yasutomi, “Modified complexity and *-Sturmian word”, Proc. Japan Acad. Ser. A Math. Sci., 75:3 (1999), 26–28 | DOI | MR | Zbl

[116] T. Kamae, L. Zamboni, “Sequence entropy and the maximal pattern complexity of infinite words”, Ergodic Theory Dynamical Systems, 22:4 (2002), 1191–1199 | DOI | MR | Zbl

[117] A. Restivo, S. Salemi, “Binary patterns in infinite binary words”, Formal and Natural Computing, Lecture Notes in Computer Science, 2300, eds. W. Brauer, H. Ehrig, J. Karhumäki, A. Salomaa, Springer, Berlin–Heidelberg, 2002, 107–116 | DOI | MR | Zbl

[118] A. E. Frid, “Arithmetical complexity of symmetric DOL words”, Theoretic Computer Science, 306 (2003), 535–542 | DOI | MR | Zbl

[119] H. Herzel, I. Grobe, “Measuring correlations in symbol sequences”, Phisica A, 216 (1995), 518–542 | DOI | MR

[120] S. V. Buldyrev, A. L. Goldberger, S. Havlin, R. N. Mantegna, M. E. Matsa, C. K. Peng, M. Simons, H. E. Stanley, “Long-range correlations properties of coding and non-coding DNA-sequences GenBank analysis”, Physical Review E, 51 (1995), 5084–5091 | DOI

[121] S. Havlin, S. V. Buldyrev, A. L. Goldberger, R. N. Mantegna, C. K. Peng, M. Simons, H. E. Stanley, “Statistical and linguistic features of DNA sequences”, Fractals, 3:2 (1995), 269–284 | DOI | MR | Zbl

[122] S. Karlin, V. Brendel, “Patchiness and correlations in DNA sequences”, Science, 259:5095 (1993), 677–680 | DOI

[123] R. F. Voss, “Long-range fractal correlations in DNA introns and exons”, Fractals, 2 (1994), 1–6 | DOI

[124] W. Li, “The study of correlation structures of DNA sequences: a critical review”, Computer Chem., 21:4 (1997), 257–271 | DOI | Zbl

[125] G. Cormode, M. Paterson, S. C. Sahinalp, U. Vishkin, “Communication complexity of document exchange”, Proc. Eleventh ACM-SIAM Symposium on Discrete Algorithms, SODA, 2000, 197–206 | MR | Zbl

Parcourir par

Geodesic

Parcourir par