Polarization- and CGR-based binary representations as identifiers of the nucleotide sequences in bioinformatics
Izvestiya VUZ. Applied Nonlinear Dynamics, Tome 32 (2024) no. 4, pp. 439-459.

Voir la notice de l'article provenant de la source Math-Net.Ru

Purpose of this work is the comparative analysis of two approaches to the synthesis of two-dimensional binary identifiers of nucleotide sequences obtained using DNA sequencing of biological objects. Methods. One of the approaches is based on modeling the polarization-dependent diffraction of a coherent readout beam on a two-dimensional phase-modulating structure (phase screen) associated with the symbolic sequence obtained as a result of DNA sequencing. Another approach uses a two-dimensional representation of the symbolic sequence using a chaos game representation (CGR). To obtain a finite-element CGR mapping, it is fragmented into a given number of cells, ensuring acceptable sensitivity of the synthesized binary identifier to structural changes in the displayed sequence. Results. The comparative analysis was carried out using fragments of symbol sequences corresponding to various strains (Wuhan, Delta, Omicron) of the SarSCoV2 virus. In the course of the analysis, the correlation coefficients between the binary identifiers corresponding to various strains were obtained and compared with each other. Conclusion. It has been established that binary identifiers synthesized using the polarization encoding technique are characterized by significantly higher sensitivity to structural changes in the analyzed sequences and smaller sizes compared to CGR binary identifiers.
Keywords: nucleotide sequences, binary representation, polarization encoding, chaos game representation
@article{IVP_2024_32_4_a2,
     author = {D. A. Zimnyakov and M. V. Alonova and An. V. Skripal and M. G. Inkin and S. S. Zaitsev and V. A. Fedorova},
     title = {Polarization- and {CGR-based} binary representations as identifiers of the nucleotide sequences in bioinformatics},
     journal = {Izvestiya VUZ. Applied Nonlinear Dynamics},
     pages = {439--459},
     publisher = {mathdoc},
     volume = {32},
     number = {4},
     year = {2024},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/IVP_2024_32_4_a2/}
}
TY  - JOUR
AU  - D. A. Zimnyakov
AU  - M. V. Alonova
AU  - An. V. Skripal
AU  - M. G. Inkin
AU  - S. S. Zaitsev
AU  - V. A. Fedorova
TI  - Polarization- and CGR-based binary representations as identifiers of the nucleotide sequences in bioinformatics
JO  - Izvestiya VUZ. Applied Nonlinear Dynamics
PY  - 2024
SP  - 439
EP  - 459
VL  - 32
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/IVP_2024_32_4_a2/
LA  - ru
ID  - IVP_2024_32_4_a2
ER  - 
%0 Journal Article
%A D. A. Zimnyakov
%A M. V. Alonova
%A An. V. Skripal
%A M. G. Inkin
%A S. S. Zaitsev
%A V. A. Fedorova
%T Polarization- and CGR-based binary representations as identifiers of the nucleotide sequences in bioinformatics
%J Izvestiya VUZ. Applied Nonlinear Dynamics
%D 2024
%P 439-459
%V 32
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/item/IVP_2024_32_4_a2/
%G ru
%F IVP_2024_32_4_a2
D. A. Zimnyakov; M. V. Alonova; An. V. Skripal; M. G. Inkin; S. S. Zaitsev; V. A. Fedorova. Polarization- and CGR-based binary representations as identifiers of the nucleotide sequences in bioinformatics. Izvestiya VUZ. Applied Nonlinear Dynamics, Tome 32 (2024) no. 4, pp. 439-459. http://geodesic.mathdoc.fr/item/IVP_2024_32_4_a2/

[1] Goodwin S., McPherson J. D., McCombie W. R., “Coming of age: ten years of next-generation sequencing technologies”, Nature Reviews Genetics, 17:6 (2016), 333–351 | DOI

[2] Neidle S., Sanderson M., Principles of Nucleic Acid Structure, Academic Press, 2021, 454 pp.

[3] Randic M., Vracko M., Lers N., Plavsic D., “Novel 2-D graphical representation of DNA sequences and their numerical characterization”, Chemical Physics Letters, 368:1–2 (2003), 1–6 | DOI

[4] Randic M., Vracko M., Nandy A., Basak S. C., “On 3-D graphical representation of DNA primary sequence and their numerical characterization”, Journal of Chemical Information and Computer Sciences, 40:5 (2000), 1235–1244 | DOI

[5] Xie G., Mo Z., “Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications”, Journal of Theoretical Biology, 269:1 (2011), 123–130 | DOI | MR | Zbl

[6] Jafarzadeh N., Iranmanesh A., “A novel graphical and numerical representation for analyzing DNA sequences based on codons”, Match-Communications in Mathematical and Computer Chemistry, 68:2 (2012), 611–620 | MR

[7] Jafarzadeh N., Iranmanesh A., “C-curve: A novel 3D graphical representation of DNA sequence based on codons”, Mathematical Biosciences, 241:2 (2013), 217–224 | DOI | MR | Zbl

[8] Hamori E., Ruskin J., “H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences”, Journal of Biological Chemistry, 258:2 (1983), 1318–1327 | DOI

[9] Zhang C. T., Zhang R., Ou H. Y., “The Z-curve databases: A graphic representation of genome sequence”, Bioinformatics, 19:5 (2003), 593–599 | DOI

[10] Yu Z. G., Wang B., “A time series model of CDS sequences in complete genome”, Chaos, Solitons Fractals, 12:3 (2001), 519–526 | DOI | MR | Zbl

[11] Jeffrey H. J., “Chaos game representation of gene structure”, Nucleic Acids Research, 18:8 (1990), 2163–2170 | DOI

[12] Anitas E. M., “Small-angle scattering and multifractal analysis of DNA sequences”, International Journal of Molecular Sciences, 21:13 (2020), 4651 | DOI

[13] Burma P. K., Raj A., Deb J. K., Brahmachari S. K., “Genome analysis: a new approach for visualization of sequence organization in genomes”, Journal of Biosciences, 17:4 (1992), 395–411 | DOI

[14] Huynen M. A., Konings D. A. M., Hogeweg P., “Equal G and C contents in histone genes indicate selection pressures on mRna secondary structure”, Journal of Molecular Evolution, 34:4 (1992), 280–291 | DOI

[15] Hill K. A., Schisler N. J., Singh S. M., “Chaos game representation of coding regions of human globin genes and alcohol dehydrogenase genes of phylogenetically divergent species”, Journal of Molecular Evolution, 35:3 (1992), 261–269 | DOI

[16] Almeida J. S., Carrico J. A., Maretzek A., Noble P. A., Fletcher M., “Analysis of genomic sequences by chaos game representation”, Bioinformatics, 17:5 (2001), 429–437 | DOI

[17] Zimnyakov D. A., Alonova M. V., Skripal An. V., Zaitsev S. S., Feodorova V. A., “Polarization analysis of gene sequence structures: Mapping of extreme local polarization states”, Journal of Biomedical Photonics Engineering, 8:4 (2022), 040302 | DOI | MR

[18] Zimnyakov D. A., Alonova M. V., Skripal An. V., Dobdin S. Y., Feodorova V. A., “Quantification of the diversity in gene structures using the principles of polarization mapping”, Current Issues in Molecular Biology, 45:2 (2023), 1720–1740 | DOI

[19] Ulyanov S. S., Ulianova O. V., Zaytsev S. S., Saltykov Y. V., Feodorova V. A., “Statistics on gene-based laser speckles with a small number of scatterers: implications for the detection of polymorphism in the Chlamydia trachomatis omp1 gene”, Laser Physics Letters, 15 (2018), 045601 | DOI

[20] Rak A., Isakova-Sivak I., Rudenko L., “Overview of Nucleocapsid-Targeting Vaccines against COVID-19”, Vaccines, 11:12 (2023), 1810 | DOI

[21] Telenti A., Hodcroft E. B., Robertson D. L., “The Evolution and Biology of SARS-CoV-2 Variants”, Cold Spring Harbor Perspectives in Medicine, 12 (2022), a041390 | DOI

[22] Bergmann C. C., Silverman R. H., “COVID-19: coronavirus replication, pathogenesis, and therapeutic strategies”, Cleveland Clinic Journal of Medicine, 87 (2020), 321–-327 | DOI

[23] Shang J., Wan Y., Luo C., Ye G., Geng Q., Auerbach A., Li F., “Cell entry mechanisms of SARS-CoV-2”, Proceedings of the National Academy of Sciences, 117 (2020), 11727–-11734 | DOI

[24] Grobbelaar L. M., Venter C., Vlok M., Ngoepe M., Laubscher G. J., Lourens P. J., Steenkamp J., Kell D. B., Pretorius E., “SARS-CoV-2 spike protein S1 induces fibrin (ogen) resistant to fibrinolysis: implications for microclot formation in COVID-19”, Bioscience Reports, 41:8 (2021), BSR20210611 | DOI

[25] Singh D., Yi S. V., “On the origin and evolution of SARS-CoV-2”, Experimental Molecular Medicine., 53 (2021), 537–-547 | DOI

[26] Zhou P., Yang X.L., Wang X.G., Hu B., Zhang L., Zhang W., Si H.R., Zhu Y., Li B., Huang C.L., Chen H.D., Chen J., Luo Y., Guo H., Jiang R.D., Liu M.Q., Chen Y., Shen X.R., Wang X., Zheng X.S., Zhao K., Chen Q.J., Deng F., Liu L.L., Yan B., Zhan F.X., Wang Y.Y., Xiao G.F., Shi Z.L., “A pneumonia outbreak associated with a new coronavirus of probable bat origin”, Nature, 579:7798 (2020), 270–273 | DOI | MR

[27] Chakraborty C., Bhattacharya M., Chopra H., Bhattacharya P., Islam M. A., Dhama K., “Recently emerged omicron subvariant BF.7 and its R346T mutation in the RBD region reveal increased transmissibility and higher resistance to neutralization antibodies: need to understand more under the current scenario of rising cases in China and fears of driving a new wave of the COVID-19 pandemic”, International Journal of Surgery., 109:4 (2023), 1037–1040 | DOI

[28] GISAID: Official hCoV-19 Reference Sequence, Acc. ID: EPI_ISL_402124 https://gisaid.org/wiv04/

[29] GISAID: Official hCoV-19 Reference Sequence, Acc. ID: EPI_ISL_2552101 https://gisaid.org/wiv04/

[30] GISAID: Official hCoV-19 Reference Sequence, Acc. ID: EPI_ISL_9991311 https://gisaid.org/wiv04/

[31] Goodman J. W., Introduction to Fourier Optics, 4th ed, Macmillan Learning, USA, New York, 2017, 491 pp. | MR

[32] Bracewell R., The Fourier Transform and Its Applications, McGraw Hill, New York, 1986, 474 pp. | MR

[33] Chipman R., Lam W. S. T., Young G., Polarized Light and Optical Systems (Optical Sciences and Applications of Light), CRC Press, Boca-Raton, 2018, 1036 pp.

[34] Anitas E. M., “Fractal analysis of DNA sequences using frequency chaos game representation and small-angle scattering”, International Journal of Molecular Sciences, 23:3 (2022), 1847 | DOI