Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms
Matematičeskaâ biologiâ i bioinformatika, Tome 13 (2018), pp. t84-t103.

Voir la notice de l'article provenant de la source Math-Net.Ru

Data analysis is presented for the HeteroGenome database first release which contains latent periodicity regions revealed in a number of eukaryotic organisms. Tandem repeats with different integrity of pattern copies, including the highly diverged repeats, have been identified in the genomes of S. cerevisiae, A. thaliana, C. elegans and D. melanogaster. Such data were obtained with the help of original spectral-statistical approach to searching for reliable regions of the latent periodicity in DNA sequences. Special structure of data presentation, consisting of the two levels, was proposed. On the first, nonredundant level the latent periodicity regions are considered as a whole and, additionally, on the second level only conservative elements of their periodic structures are shown. Such data presentation allowed estimating share of the periodicity regions as nearly 10% of the length in analyzed genomes. This estimate was deduced basing on the first level data. Quantitative and qualitative investigation of the latent periodicity regions, their divergence level over all chromosomes of the organisms considered, revealed characteristic types of periodicity in the genome of every organism. Histograms of density distribution for the latent periodicity regions on each chromosome of the genomes analyzed were obtained. Repertoire of period lengths were determinated. The HeteroGenome database has additional possibilities for inner data analysis and is accessible by URL: http://www.jcbi.ru/lp_baze/.
@article{MBB_2018_13_a6,
     author = {M. B. Chaley and V. A. Kutyrkin and E. I. Teplukhina and G. E. Tyulbasheva and N. N. Nazipova},
     title = {Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {t84--t103},
     publisher = {mathdoc},
     volume = {13},
     year = {2018},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/MBB_2018_13_a6/}
}
TY  - JOUR
AU  - M. B. Chaley
AU  - V. A. Kutyrkin
AU  - E. I. Teplukhina
AU  - G. E. Tyulbasheva
AU  - N. N. Nazipova
TI  - Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2018
SP  - t84
EP  - t103
VL  - 13
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2018_13_a6/
LA  - en
ID  - MBB_2018_13_a6
ER  - 
%0 Journal Article
%A M. B. Chaley
%A V. A. Kutyrkin
%A E. I. Teplukhina
%A G. E. Tyulbasheva
%A N. N. Nazipova
%T Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms
%J Matematičeskaâ biologiâ i bioinformatika
%D 2018
%P t84-t103
%V 13
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2018_13_a6/
%G en
%F MBB_2018_13_a6
M. B. Chaley; V. A. Kutyrkin; E. I. Teplukhina; G. E. Tyulbasheva; N. N. Nazipova. Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms. Matematičeskaâ biologiâ i bioinformatika, Tome 13 (2018), pp. t84-t103. http://geodesic.mathdoc.fr/item/MBB_2018_13_a6/

[1] Richard G. F., Kerrest A., Dujon B., “Comparative genomics and molecular dynamics of DNA repeats in eukaryotes”, Microbiol. Mol. Biol. Rev., 72 (2008), 686–727 <ext-link ext-link-type='doi' href='https://doi.org/10.1128/MMBR.00011-08'>10.1128/MMBR.00011-08</ext-link>

[2] Kelkar Y. D., Strubczewski N., Hile S. E., Chiaromonte F., Eckert K. A., Makova K. D., “What is a microsatellite: a computational and experimental definition based upon repeat mutational behavior at A/T and GT/AC repeats”, Genome Biol. Evol., 2 (2010), 620–635 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/gbe/evq046'>10.1093/gbe/evq046</ext-link>

[3] Ellegren H., “Microsatellites: simple sequences with complex evolution”, Nat. Rev. Genet., 5 (2004), 435–445 <ext-link ext-link-type='doi' href='https://doi.org/10.1038/nrg1348'>10.1038/nrg1348</ext-link>

[4] Welch J. W., Maloney D. H., Fogel S., “Unequal crossing-over and gene conversion at the amplified CUP1 locus of yeast”, Mol. Gen. Genet., 222 (1990), 304–310 <ext-link ext-link-type='doi' href='https://doi.org/10.1007/BF00633833'>10.1007/BF00633833</ext-link>

[5] Tyler-Smith C., Willard H. F., “Mammalian chromosome structure”, Curr. Opin. Genet. Dev., 3 (1993), 390–397 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/0959-437X(93)90110-B'>10.1016/0959-437X(93)90110-B</ext-link>

[6] Hewett D. R., Handt O., Hobson L., Mangelsdorf M., Eyre H. J., Baker E., Sutherland G. R., Schuffenhauer S., Mao J. I., Richards R. I., “FRA10B structure reveals common elements in repeat expansion and chromosomal fragile site genesis”, Mol. Cell., 1 (1998), 773–781 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/S1097-2765(00)80077-5'>10.1016/S1097-2765(00)80077-5</ext-link>

[7] Yu S., Mangelsdorf M., Hewett D., Hobson L., Baker E., Eyre H. J., Lapsys N., Le Paslier D., Doggett N. A., Sutherland G. R., Richards R. I., “Human chromosomal fragile site FRA16B is an amplified AT-rich minisatellite repeat”, Cell, 88 (1997), 367–374 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/S0092-8674(00)81875-9'>10.1016/S0092-8674(00)81875-9</ext-link>

[8] Fu Y. H., Kuhl D. P., Pizzuti A., Pieretti M., Sutcliffe J. S., Richards S., Verkerk A. J., Holden J. J., Fenwick R. G. Jr, Warren S. T., et al., “Variation of the CGG repeat at the fragile X site results in genetic instability: resolution of the Sherman paradox”, Cell, 67 (1991), 1047–1058 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/0092-8674(91)90283-5'>10.1016/0092-8674(91)90283-5</ext-link>

[9] Liquori C. L., Ricker K., Moseley M. L., Jacobsen J. F., Kress W., Naylor S. L., Day J. W., Ranum L. P., “Myotonic dystrophy type 2 caused by a CCTG expansion in intron 1 of ZNF9”, Science, 293 (2001), 864–867 <ext-link ext-link-type='doi' href='https://doi.org/10.1126/science.1062125'>10.1126/science.1062125</ext-link>

[10] Matsuura T., Fang P., Pearson C. E., Jayakar P., Ashizawa T., Roa B. B., Nelson D. L., Interruptions in the expanded ATTCT repeat of spinocerebellar ataxia type 10: repeat purity as a disease modifier?, Am. J. Hum. Genet., 78 (2006), 125–129 <ext-link ext-link-type='doi' href='https://doi.org/10.1086/498654'>10.1086/498654</ext-link>

[11] Lalioti M. D., Scott H. S., Buresi C., Rossier C., Bottani A., Morris M. A., Malafosse A., Antonarakis S. E., “Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy”, Nature, 386 (1997), 847–851 <ext-link ext-link-type='doi' href='https://doi.org/10.1038/386847a0'>10.1038/386847a0</ext-link>

[12] Martin P., Makepeace K., Hill S. A., Hood D. W., Moxon E. R., “Microsatellite instability regulates transcription factor binding and gene expression”, Proc. Natl. Acad. Sci. USA, 102 (2005), 3800–3804 <ext-link ext-link-type='doi' href='https://doi.org/10.1073/pnas.0406805102'>10.1073/pnas.0406805102</ext-link>

[13] Benson G., “Tandem repeats finder: a program to analyze DNA sequences”, Nucleic Acids Res., 27 (1999), 573–580 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/27.2.573'>10.1093/nar/27.2.573</ext-link>

[14] Reneker J., Shyu C. R., Zeng P., Polacco J. C., Gassmann W., “ACMES: fast multiple-genome searches for short repeat sequences with concurrent cross-species information retrieval”, Nucleic Acids Res., 32 (2004), W649–W653 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/gkh455'>10.1093/nar/gkh455</ext-link>

[15] Roset R., Subirana J. A., Messeguer X., “MREPATT: detection and analysis of exact consecutive repeats in genomic sequences”, Bioinformatics, 19 (2003), 2475–2476 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btg326'>10.1093/bioinformatics/btg326</ext-link>

[16] Parisi V., Fonzo V. D., Aluffi-Pentini F., “STRING: finding tandem repeats in DNA sequences”, Bioinformatics, 19 (2003), 1733–1738 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btg268'>10.1093/bioinformatics/btg268</ext-link>

[17] Kolpakov R., Kucherov G., “Mreps: efficient and flexible detection of tandem repeats in DNA”, Nucleic Acids Res., 31 (2003), 3672–3678 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/gkg617'>10.1093/nar/gkg617</ext-link>

[18] Wexler Y., Yakhini Z., Kashi Y., Geiger D., “Finding approximate tandem repeats in genomic sequences”, J. Comput. Biol., 12 (2005), 928–942 <ext-link ext-link-type='doi' href='https://doi.org/10.1089/cmb.2005.12.928'>10.1089/cmb.2005.12.928</ext-link>

[19] Boeva V., Regnier M., Papatsenko D., Makeev V., “Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression”, Bioinformatics, 22 (2006), 676–684 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btk032'>10.1093/bioinformatics/btk032</ext-link>

[20] Mudunuri S. B., Nagarajaram H. A., “IMEx: imperfect microsatellite extractor”, Bioinformatics, 23 (2007), 1181–1187 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btm097'>10.1093/bioinformatics/btm097</ext-link>

[21] Pellegrini M., Renda M. E., Vecchio A., “TRStalker: an efficient heuristic for finding fuzzy tandem repeats”, Bioinformatics, 26 (2010), i358–i366 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btq209'>10.1093/bioinformatics/btq209</ext-link>

[22] Sokol D., Benson G., Tojeira J., “Tandem repeats over the edit distance”, Bioinformatics, 23 (2007), e30–e35 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btl309'>10.1093/bioinformatics/btl309</ext-link>

[23] Sokol D., Atagun F., “TRedD — A database for tandem repeats over the edit distance”, Database, 2010, baq003

[24] Gelfand Y., Rodriguez A., Benson G., “TRDB — the Tandem Repeats Database”, Nucleic Acids Res., 35 (2007), 80–87 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/gkl1013'>10.1093/nar/gkl1013</ext-link>

[25] Boby T., Patch A., Aves S., “TRbase: a database relating tandem repeats to disease genes for the human genome”, Bioinformatics, 21 (2005), 860–921 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/bti059'>10.1093/bioinformatics/bti059</ext-link>

[26] Chaley M. B., Nazipova N. N., Kutyrkin V. A., “Statistical methods for detecting latent periodicity patterns in biological sequences: the case of small-size samples”, Pattern Recogn. Image Anal., 19 (2009), 358–367 <ext-link ext-link-type='doi' href='https://doi.org/10.1134/S1054661809020217'>10.1134/S1054661809020217</ext-link>

[27] Chaley M. B., Nazipova N. N., Kutyrkin V. A., “Joint use of different homogeneity testing criteria for latent periodicity revelation in biological sequences”, Math. Biol. Bioinf, 2:1 (2007), 20–35 (accessed 28.07.2013) <ext-link ext-link-type='uri' href='http://www.matbio.org/downloads/Chaley2007(2_20).pdf'>http://www.matbio.org/downloads/Chaley2007(2_20).pdf</ext-link><ext-link ext-link-type='doi' href='https://doi.org/10.17537/2007.2.20'>10.17537/2007.2.20</ext-link>

[28] Chaley M., Kutyrkin V., “Model of perfect tandem repeat with random pattern and empirical homogeneity testing poly-criteria for latent periodicity revelation in biological sequences”, Math. Biosci., 211 (2008), 186–204 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/j.mbs.2007.10.008'>10.1016/j.mbs.2007.10.008</ext-link><ext-link ext-link-type='mr-item-id' href='http://mathscinet.ams.org/mathscinet-getitem?mr=2392420'>2392420</ext-link><ext-link ext-link-type='zbl-item-id' href='https://zbmath.org/?q=an:1130.92022'>1130.92022</ext-link>

[29] Fields S., Johnston M., Cell biology. Whither model organism research?, Science, 307 (2005), 1885–1886 <ext-link ext-link-type='doi' href='https://doi.org/10.1126/science.1108872'>10.1126/science.1108872</ext-link>

[30] “International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome”, Nature, 409 (2001), 860–921 <ext-link ext-link-type='doi' href='https://doi.org/10.1038/35057062'>10.1038/35057062</ext-link>