Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms
Matematičeskaâ biologiâ i bioinformatika, Tome 13 (2018) no. 3, pp. t84-t103

Voir la notice de l'article provenant de la source Math-Net.Ru

Data analysis is presented for the HeteroGenome database first release which contains latent periodicity regions revealed in a number of eukaryotic organisms. Tandem repeats with different integrity of pattern copies, including the highly diverged repeats, have been identified in the genomes of S. cerevisiae, A. thaliana, C. elegans and D. melanogaster. Such data were obtained with the help of original spectral-statistical approach to searching for reliable regions of the latent periodicity in DNA sequences. Special structure of data presentation, consisting of the two levels, was proposed. On the first, nonredundant level the latent periodicity regions are considered as a whole and, additionally, on the second level only conservative elements of their periodic structures are shown. Such data presentation allowed estimating share of the periodicity regions as nearly 10% of the length in analyzed genomes. This estimate was deduced basing on the first level data. Quantitative and qualitative investigation of the latent periodicity regions, their divergence level over all chromosomes of the organisms considered, revealed characteristic types of periodicity in the genome of every organism. Histograms of density distribution for the latent periodicity regions on each chromosome of the genomes analyzed were obtained. Repertoire of period lengths were determinated. The HeteroGenome database has additional possibilities for inner data analysis and is accessible by URL: http://www.jcbi.ru/lp_baze/.
@article{MBB_2018_13_3_a6,
     author = {M. B. Chaley and V. A. Kutyrkin and E. I. Teplukhina and G. E. Tyulbasheva and N. N. Nazipova},
     title = {Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {t84--t103},
     publisher = {mathdoc},
     volume = {13},
     number = {3},
     year = {2018},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/MBB_2018_13_3_a6/}
}
TY  - JOUR
AU  - M. B. Chaley
AU  - V. A. Kutyrkin
AU  - E. I. Teplukhina
AU  - G. E. Tyulbasheva
AU  - N. N. Nazipova
TI  - Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2018
SP  - t84
EP  - t103
VL  - 13
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2018_13_3_a6/
LA  - en
ID  - MBB_2018_13_3_a6
ER  - 
%0 Journal Article
%A M. B. Chaley
%A V. A. Kutyrkin
%A E. I. Teplukhina
%A G. E. Tyulbasheva
%A N. N. Nazipova
%T Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms
%J Matematičeskaâ biologiâ i bioinformatika
%D 2018
%P t84-t103
%V 13
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2018_13_3_a6/
%G en
%F MBB_2018_13_3_a6
M. B. Chaley; V. A. Kutyrkin; E. I. Teplukhina; G. E. Tyulbasheva; N. N. Nazipova. Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms. Matematičeskaâ biologiâ i bioinformatika, Tome 13 (2018) no. 3, pp. t84-t103. http://geodesic.mathdoc.fr/item/MBB_2018_13_3_a6/

[1] Richard G. F., Kerrest A., Dujon B., “Comparative genomics and molecular dynamics of DNA repeats in eukaryotes”, Microbiol. Mol. Biol. Rev., 72 (2008), 686–727 | DOI | DOI

[2] Kelkar Y. D., Strubczewski N., Hile S. E., Chiaromonte F., Eckert K. A., Makova K. D., “What is a microsatellite: a computational and experimental definition based upon repeat mutational behavior at A/T and GT/AC repeats”, Genome Biol. Evol., 2 (2010), 620–635 | DOI | DOI

[3] Ellegren H., “Microsatellites: simple sequences with complex evolution”, Nat. Rev. Genet., 5 (2004), 435–445 | DOI | DOI

[4] Welch J. W., Maloney D. H., Fogel S., “Unequal crossing-over and gene conversion at the amplified CUP1 locus of yeast”, Mol. Gen. Genet., 222 (1990), 304–310 | DOI | DOI

[5] Tyler-Smith C., Willard H. F., “Mammalian chromosome structure”, Curr. Opin. Genet. Dev., 3 (1993), 390–397 | DOI | DOI

[6] Hewett D. R., Handt O., Hobson L., Mangelsdorf M., Eyre H. J., Baker E., Sutherland G. R., Schuffenhauer S., Mao J. I., Richards R. I., “FRA10B structure reveals common elements in repeat expansion and chromosomal fragile site genesis”, Mol. Cell., 1 (1998), 773–781 | DOI | DOI

[7] Yu S., Mangelsdorf M., Hewett D., Hobson L., Baker E., Eyre H. J., Lapsys N., Le Paslier D., Doggett N. A., Sutherland G. R., Richards R. I., “Human chromosomal fragile site FRA16B is an amplified AT-rich minisatellite repeat”, Cell, 88 (1997), 367–374 | DOI | DOI

[8] Fu Y. H., Kuhl D. P., Pizzuti A., Pieretti M., Sutcliffe J. S., Richards S., Verkerk A. J., Holden J. J., Fenwick R. G. Jr, Warren S. T., et al., “Variation of the CGG repeat at the fragile X site results in genetic instability: resolution of the Sherman paradox”, Cell, 67 (1991), 1047–1058 | DOI | DOI

[9] Liquori C. L., Ricker K., Moseley M. L., Jacobsen J. F., Kress W., Naylor S. L., Day J. W., Ranum L. P., “Myotonic dystrophy type 2 caused by a CCTG expansion in intron 1 of ZNF9”, Science, 293 (2001), 864–867 | DOI | DOI

[10] Matsuura T., Fang P., Pearson C. E., Jayakar P., Ashizawa T., Roa B. B., Nelson D. L., Interruptions in the expanded ATTCT repeat of spinocerebellar ataxia type 10: repeat purity as a disease modifier?, Am. J. Hum. Genet., 78 (2006), 125–129 | DOI | DOI

[11] Lalioti M. D., Scott H. S., Buresi C., Rossier C., Bottani A., Morris M. A., Malafosse A., Antonarakis S. E., “Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy”, Nature, 386 (1997), 847–851 | DOI | DOI

[12] Martin P., Makepeace K., Hill S. A., Hood D. W., Moxon E. R., “Microsatellite instability regulates transcription factor binding and gene expression”, Proc. Natl. Acad. Sci. USA, 102 (2005), 3800–3804 | DOI | DOI

[13] Benson G., “Tandem repeats finder: a program to analyze DNA sequences”, Nucleic Acids Res., 27 (1999), 573–580 | DOI | DOI

[14] Reneker J., Shyu C. R., Zeng P., Polacco J. C., Gassmann W., “ACMES: fast multiple-genome searches for short repeat sequences with concurrent cross-species information retrieval”, Nucleic Acids Res., 32 (2004), W649–W653 | DOI | DOI

[15] Roset R., Subirana J. A., Messeguer X., “MREPATT: detection and analysis of exact consecutive repeats in genomic sequences”, Bioinformatics, 19 (2003), 2475–2476 | DOI | DOI

[16] Parisi V., Fonzo V. D., Aluffi-Pentini F., “STRING: finding tandem repeats in DNA sequences”, Bioinformatics, 19 (2003), 1733–1738 | DOI | DOI

[17] Kolpakov R., Kucherov G., “Mreps: efficient and flexible detection of tandem repeats in DNA”, Nucleic Acids Res., 31 (2003), 3672–3678 | DOI | DOI

[18] Wexler Y., Yakhini Z., Kashi Y., Geiger D., “Finding approximate tandem repeats in genomic sequences”, J. Comput. Biol., 12 (2005), 928–942 | DOI | DOI

[19] Boeva V., Regnier M., Papatsenko D., Makeev V., “Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression”, Bioinformatics, 22 (2006), 676–684 | DOI | DOI

[20] Mudunuri S. B., Nagarajaram H. A., “IMEx: imperfect microsatellite extractor”, Bioinformatics, 23 (2007), 1181–1187 | DOI | DOI

[21] Pellegrini M., Renda M. E., Vecchio A., “TRStalker: an efficient heuristic for finding fuzzy tandem repeats”, Bioinformatics, 26 (2010), i358–i366 | DOI | DOI

[22] Sokol D., Benson G., Tojeira J., “Tandem repeats over the edit distance”, Bioinformatics, 23 (2007), e30–e35 | DOI | DOI

[23] Sokol D., Atagun F., “TRedD — A database for tandem repeats over the edit distance”, Database, 2010, baq003

[24] Gelfand Y., Rodriguez A., Benson G., “TRDB — the Tandem Repeats Database”, Nucleic Acids Res., 35 (2007), 80–87 | DOI | DOI

[25] Boby T., Patch A., Aves S., “TRbase: a database relating tandem repeats to disease genes for the human genome”, Bioinformatics, 21 (2005), 860–921 | DOI | DOI

[26] Chaley M. B., Nazipova N. N., Kutyrkin V. A., “Statistical methods for detecting latent periodicity patterns in biological sequences: the case of small-size samples”, Pattern Recogn. Image Anal., 19 (2009), 358–367 | DOI | DOI

[27] Chaley M. B., Nazipova N. N., Kutyrkin V. A., “Joint use of different homogeneity testing criteria for latent periodicity revelation in biological sequences”, Math. Biol. Bioinf, 2:1 (2007), 20–35 (accessed 28.07.2013) http://www.matbio.org/downloads/Chaley2007(2_20).pdf | DOI | DOI

[28] Chaley M., Kutyrkin V., “Model of perfect tandem repeat with random pattern and empirical homogeneity testing poly-criteria for latent periodicity revelation in biological sequences”, Math. Biosci., 211 (2008), 186–204 | DOI | MR | Zbl | DOI | MR | Zbl

[29] Fields S., Johnston M., Cell biology. Whither model organism research?, Science, 307 (2005), 1885–1886 | DOI | DOI

[30] “International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome”, Nature, 409 (2001), 860–921 | DOI | DOI