Joint use of different homogeneity testing criteria for latent periodicity revelation in biological sequences
Matematičeskaâ biologiâ i bioinformatika, Tome 2 (2007) no. 1, pp. 20-35.

Voir la notice de l'article provenant de la source Math-Net.Ru

A model of additional statistical experiments has been used in this work to reveal latent periodicity in biological sequences. This model which generalizes a notion of fuzzy tandem repeats (FTRs) has allowed us to propose original statistical methods for estimation of periodicity pattern in the approximate tandem repeats (ATRs). It has been shown that if indels' percentage in approximate tandem repeats is high, then for a number of cases the alignment of copies which is based on approximation of repeat’s pattern size according to this model appears to be more optimal, compared with alignment obtained by well know Tandem Repeats Finder method (TRF). Compared with existing analogs, the proposed methods have greater power. The main advantage of the proposed methods is in their applicability in practical conditions of unrepresentative sample.
@article{MBB_2007_2_1_a3,
     author = {M. B. Chaley and N. N. Nazipova and V. A. Kutyrkin},
     title = {Joint use of different homogeneity testing criteria for latent periodicity revelation in biological sequences},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {20--35},
     publisher = {mathdoc},
     volume = {2},
     number = {1},
     year = {2007},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MBB_2007_2_1_a3/}
}
TY  - JOUR
AU  - M. B. Chaley
AU  - N. N. Nazipova
AU  - V. A. Kutyrkin
TI  - Joint use of different homogeneity testing criteria for latent periodicity revelation in biological sequences
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2007
SP  - 20
EP  - 35
VL  - 2
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2007_2_1_a3/
LA  - ru
ID  - MBB_2007_2_1_a3
ER  - 
%0 Journal Article
%A M. B. Chaley
%A N. N. Nazipova
%A V. A. Kutyrkin
%T Joint use of different homogeneity testing criteria for latent periodicity revelation in biological sequences
%J Matematičeskaâ biologiâ i bioinformatika
%D 2007
%P 20-35
%V 2
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2007_2_1_a3/
%G ru
%F MBB_2007_2_1_a3
M. B. Chaley; N. N. Nazipova; V. A. Kutyrkin. Joint use of different homogeneity testing criteria for latent periodicity revelation in biological sequences. Matematičeskaâ biologiâ i bioinformatika, Tome 2 (2007) no. 1, pp. 20-35. http://geodesic.mathdoc.fr/item/MBB_2007_2_1_a3/

[1] Benson G., “Tandem repeats finder: a program to analyze DNA sequences”, Nucl. Acids Res., 27 (1999), 573–580 | DOI

[2] Benson G., “A new distance measure for comparing sequence profiles based on path length along an entropy surface”, Bioinformatics, 18 (2002), S44–S53

[3] Kolpakov R., Bana G., Kucherov G., “mreps: efficient and flexible detection of tandem repeats in DNA”, Nucl. Acids Res., 31 (2003), 3672–3678 | DOI

[4] Boeva V., Regnier M., Papatsenko D., Makeev V., “Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression”, Bioinformatics, 22 (2006), 676–684 | DOI

[5] Krishnan A., Tang F., “Exhaustive whole-genome tandem repeats search”, Bioinformatics, 20 (2004), 202702–2710 | DOI

[6] Collins J. R., Stephens R. M., Gold B., Long B., Dean M., Burt S. K., “An exhaustive DNA microsatellite map of the human genome using high performance computing”, Genomics., 82 (2003), 10–19 | DOI

[7] Denoeud F., Vergnaud G., “Identification of polymorphic tandem repeats by direct comparison of genome sequence from different bacterial strains: a web-based resource”, BMC Bioinformatics, 5 (2004), 4 | DOI

[8] Le Fleche P., Hauck Y., Onteniente L., Prieur A., Denoeud F., Ramisse V., Sylvestre P., Benson G., Ramisse F., Vergnaud G., “A tandem repeats database for bacterial genomes: application to the genotyping of Yersinia pestis and Bacillus anthracis”, BMC Microbiol., 1 (2001), 2 | DOI

[9] Naslund K., Saetre P., von Salome J., Bergstrom T. F., Jareborg N., Jazin E., “Genome-wide prediction of human VNTRs”, Genomics, 85 (2005), 24–35 | DOI

[10] Boby T., Patch A. M., Aves S. J., “TRbase: a database relating tandem repeats to disease genes for the human genome”, Bioinformatics, 21 (2005), 811–816 | DOI

[11] Missirlis P. I., Mead C. L., Butland S. L., Ouellette B. F., Devon R. S., Leavitt B. R., Holt R. A., “Satellog: a database for the identification and prioritization of satellite repeats in disease association studies”, BMC Bioinformatics, 6 (2005), 145 | DOI

[12] P. Siwach, S. D. Pophaly, Ganesh S., “Genomic and evolutionary insights into genes encoding proteins with single amino acid repeats”, Mol. Biol. Evol., 23 (2006), 1357–1369 | DOI

[13] Katti M. V., Sami-Subbu R., Ranjekar P. K., Gupta V. S., “Amino acid repeat patterns in protein sequences: their diversity and structural-functional implications”, Protein Sci., 9 (2000), 1203–1209 | DOI

[14] Tompa P., “Intrinsically unstructured proteins evolve by repeat expansion”, Bioessays., 25 (2003), 847–855 | DOI

[15] Kalita M. K., Ramasamy G., Duraisamy S., Chauhan V. S., Gupta D., “ProtRepeatsDB: a database of amino acid repeats in genomes”, BMC Bioinformatics, 7 (2006), 336 | DOI

[16] Turutina V. P., Laskin A. A., Kudryashov N. A., Skryabin K. G., Korotkov E. V., “Identification of amino acid latent periodicity within 94 protein families”, J. Comput. Biol., 13 (2006), 946–964 | DOI | MR

[17] Silverman B. D., Linsker R., “A measure of DNA periodicity”, J. Theor. Biol., 118 (1986), 295–300 | DOI

[18] Sharma D., Issac B., Raghava G. P., Ramaswamy R., “Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation”, Bioinformatics, 20 (2004), 1405–1412 | DOI

[19] Marple S. L., Digital Spectral Analysis with Applications, Prentice-Hall, Baltimore, 1987 | MR

[20] Altaiski M., Mornev O., Polozov R., “Wavelet analysis of DNA sequences”, Genet. Anal., 12 (1996), 165–168

[21] Dodin G., Vandergheynst P., Levoir P., Cordier C., Marcourt L., “Fourier and wavelet transform analysis, a tool for visualizing regular patterns in DNA sequences”, J. Theor Biol., 206 (2000), 323–326 | DOI

[22] Landau G., Schmidt J., Sokol D., “An algorithm for approximate tandem repeats”, J. Comp. Biol., 8 (2001), 1–18 | DOI

[23] Castello A. T., Martins W., Gao G. R., “TROLL – tandem repeat occurrence locator”, Bioinformatics, 18 (2002), 634–636 | DOI

[24] Hauth A. M., Joseph D. A., “Beyond tandem repeats: complex pattern structures and distant regions of similarity”, Bioinformatics, 18 (2002), 31–37

[25] Shulman M. J., Steinberg C. M., Westmoreland N., “The coding function of nucleotide sequences can be discerned by statistical analysis”, J. Theor. Biol., 88 (1981), 409–420 | DOI | MR

[26] Korotkov E. V., Korotkova M. A., Kudryashov N. A., “Information decomposition method to analyze symbolical sequences”, Phys. Lett. A, 312 (2003), 198–210 | DOI | MR | Zbl

[27] Korotkova M. A., Korotkov E. V., Rudenko V. M., “Latent periodicity in protein sequences”, J. Mol. Model., 5 (1999), 103–115 | DOI

[28] Gatherer D., McEwan N., “Analysis of sequence periodicity in E.coli proteins”, J. Mol. Evol., 57 (2003), 149–158 | DOI

[29] Shelenkov A., Skryabin K., Korotkov E., “Search and classification of potential minisatellite sequences from bacterial genomes”, DNA Res., 13 (2006), 89–102 | DOI

[30] Li W., “The study of correlation structures of DNA sequences: a critical review”, Computers Chem., 21 (1997), 257–271 | DOI | Zbl

[31] Cramer H., Mathematical methods of statistics, Stockholm, 1946 | MR

[32] Kullback S., Information theory and statistics, Dover Publications, 1968 | MR

[33] Chaley M. B., Korotkov E. V., Skryabin K. G., “Method revealing latent periodicity of the nucleotide sequences modified for a case of small samples”, DNA Res., 6 (1999), 153–163 | DOI

[34] Gribskov M., Lüthy R., Eisenberg D., “Profile analysis”, Meth. Enzymol., 183 (1990), 146–159 | DOI