Acceleration of recombinant viral sequences search by 3SEQ algorithm via adding support of multi-threaded calculations and considering sample collection dates
Matematičeskaâ biologiâ i bioinformatika, Tome 19 (2024), pp. 338-353.

Voir la notice de l'article provenant de la source Math-Net.Ru

The article presents an efficient multithreaded implementation of the modern 3SEQ algorithm for detecting recombinant genetic sequences, tested on viral genomes. The work was carried out within the framework of the project to create a domestic (Russian) web-platform (bioprojects.iis.nsk.su) for solving a wide range of problems related to data analysis in the field of bioinformatics, virology and epidemiology. A recombinant viral genome emerges when two different variants of virus genomes of the same species exchange their parts, which is possible in case of infection with both variants simultaneously. The emergence of recombinants is rare but important events in the context of virus evolution research. One of the most promising among the existing algorithms for searching for recombinants is 3SEQ, but the author's version works only in single-threaded mode. We implemented this algorithm with support for multithreaded computing and taking into account the dates of sample collection, which provided a significant increase in the computing speed. The developed software was used to search for recombinants in the samples of influenza A H1N1 (only PB2 segments from 2174 genomes were analyzed), Dengue fever (726 genomes), Ebola virus (865 genomes) and in two samples of SARS-CoV-2 coronavirus (776 and 2132 genomes). No recombinants were found for influenza A H1N1 (PB2 segment) and the first dataset on SARS-CoV-2 (variant from Russia), which is in agreement with the analysis of the same data by the RDP algorithm. For the second SARS-CoV-2 dataset (variants from the Siberian Federal District), the only recombinant present in the dataset was correctly found. 725 recombinants were found in Dengue fever viruses, with a recombination region length in the range from 50 to 1000 nucleotides. In Ebola viruses, the length of the recombination region was shorter – in 572 recombinants it was in the range of 50 to 100 nucleotides, and in 249 genomes – was less than 50 nucleotides.
@article{MBB_2024_19_a3,
     author = {A. P. Devyaterikov and A. Palyanov},
     title = {Acceleration of recombinant viral sequences search by {3SEQ} algorithm via adding support of multi-threaded calculations and considering sample collection dates},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {338--353},
     publisher = {mathdoc},
     volume = {19},
     year = {2024},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MBB_2024_19_a3/}
}
TY  - JOUR
AU  - A. P. Devyaterikov
AU  - A. Palyanov
TI  - Acceleration of recombinant viral sequences search by 3SEQ algorithm via adding support of multi-threaded calculations and considering sample collection dates
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2024
SP  - 338
EP  - 353
VL  - 19
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2024_19_a3/
LA  - ru
ID  - MBB_2024_19_a3
ER  - 
%0 Journal Article
%A A. P. Devyaterikov
%A A. Palyanov
%T Acceleration of recombinant viral sequences search by 3SEQ algorithm via adding support of multi-threaded calculations and considering sample collection dates
%J Matematičeskaâ biologiâ i bioinformatika
%D 2024
%P 338-353
%V 19
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2024_19_a3/
%G ru
%F MBB_2024_19_a3
A. P. Devyaterikov; A. Palyanov. Acceleration of recombinant viral sequences search by 3SEQ algorithm via adding support of multi-threaded calculations and considering sample collection dates. Matematičeskaâ biologiâ i bioinformatika, Tome 19 (2024), pp. 338-353. http://geodesic.mathdoc.fr/item/MBB_2024_19_a3/

[1] Drake J.W., Charlesworth B., Charlesworth D., Crow J.F., “Rates of spontaneous mutation”, Genetics, 148:4 (1998), 1667–1686 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/genetics/148.4.1667'>10.1093/genetics/148.4.1667</ext-link>

[2] R. Sanjuan, P. Domingo-Calap, “Mechanisms of viral mutation”, Cell Mol. Life Sci, 73:23 (2016), 4433–4448 <ext-link ext-link-type='doi' href='https://doi.org/10.1007/s00018-016-2299-6'>10.1007/s00018-016-2299-6</ext-link>

[3] A. Bolze, S. Basler, S. White, A. D. Rossi, D. Wyman, H. Dai, P. Roychoudhury, A. L. Greninger, K. Hayashibara, M. Beatty et al., “Evidence for SARS-CoV-2 Delta and Omicron co-infections and recombination”, Med, 3:12 (2022), 848–859 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/j.medj.2022.10.002'>10.1016/j.medj.2022.10.002</ext-link>

[4] E. Simon-Loriere, E. C. Holmes, Why do RNA viruses recombine?, Nat. Rev. Microbiol, 9:8 (2011), 617–626 <ext-link ext-link-type='doi' href='https://doi.org/10.1038/nrmicro2614'>10.1038/nrmicro2614</ext-link>

[5] D. M. Mount, Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 2004

[6] S. Will, T. Joshi, I. L. Hofacker, P. F. Stadler, R. Backofen, “LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs”, RNA, 18:5 (2012), 900–914 <ext-link ext-link-type='doi' href='https://doi.org/10.1261/rna.029041.111'>10.1261/rna.029041.111</ext-link>

[7] K. Katoh, J. Rozewicki, K. D. Yamada, “MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization”, Briefings in Bioinformatics, 20:4 (2019), 1160–1166 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bib/bbx108'>10.1093/bib/bbx108</ext-link>

[8] D. Martin, E. Rybicki, “RDP: detection of recombination amongst aligned sequences”, Bioinformatics, 16:6 (2000), 562–563 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/16.6.562'>10.1093/bioinformatics/16.6.562</ext-link>

[9] D. P. Martin, A. Varsani, P. Roumagnac, G. Botha, S. Maslamoney, T. Schwab, Z. Kelz, V. Kumar, B. Murrell, “RDP5: a computer program for analyzing recombination in, and removing signals of recombination from, nucleotide sequence datasets”, Virus Evolution, 7:1 (2020) <ext-link ext-link-type='doi' href='https://doi.org/10.1093/ve/veaa087'>10.1093/ve/veaa087</ext-link>

[10] A. Varabyou, C. Pockrandt, S. L. Salzberg, M. Pertea, “Rapid detection of inter-clade recombination in SARS-CoV-2 with Bolotie”, Genetics, 218:3 (2021) <ext-link ext-link-type='doi' href='https://doi.org/10.1093/genetics/iyab074'>10.1093/genetics/iyab074</ext-link>

[11] T. Alfonsi, A. Bernasconi, M. Chiara, S. Ceri, “Data-driven recombination detection in viral genomes”, Nat. Commun, 15 (2024) <ext-link ext-link-type='doi' href='https://doi.org/10.1038/s41467-024 47464-5'>10.1038/s41467-024 47464-5</ext-link>

[12] G. D. Forney, “The Viterbi algorithm”, Proceedings of the IEEE, 61:3 (1973), 268–278 <ext-link ext-link-type='doi' href='https://doi.org/10.1109/proc.1973.9030'>10.1109/proc.1973.9030</ext-link><ext-link ext-link-type='mr-item-id' href='http://mathscinet.ams.org/mathscinet-getitem?mr=439384'>439384</ext-link>

[13] M. F. Boni, D. Posada, M. W. Feldman, “An exact nonparametric method for inferring mosaic structure in sequence triplets”, Genetics, 176:2 (2007), 1035–1047 <ext-link ext-link-type='doi' href='https://doi.org/10.1534/genetics.106.068874'>10.1534/genetics.106.068874</ext-link>

[14] H. M. Lam, O. Ratmann, M. F. Boni, “Improved algorithmic complexity for the 3SEQ recombination detection algorithm”, Mol. Biol. Evol, 35:1 (2018), 247–251 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/molbev/msx263'>10.1093/molbev/msx263</ext-link>

[15] W. Feller, An Introduction to Probability Theory and Its Applications, v. I, John Wiley & Sons, New York, 1957 <ext-link ext-link-type='mr-item-id' href='http://mathscinet.ams.org/mathscinet-getitem?mr=88081'>88081</ext-link><ext-link ext-link-type='zbl-item-id' href='https://zbmath.org/?q=an:0077.12201'>0077.12201</ext-link>

[16] J. Hadfield, C. Megill, S. M. Bell, J. Huddleston, B. Potter, C. Callender, P. Sagulenko, T. Bedford, R. A. Neher, “Nextstrain: real-time tracking of pathogen evolution”, Bioinformatics, 34:23 (2018), 4121–4123 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/bty407'>10.1093/bioinformatics/bty407</ext-link>

[17] I. Aksamentov, C. Roemer, E. B. Hodcroft, R. A. Neher, “Nextclade: clade assignment, mutation calling and quality control for viral genomes”, Journal of Open Source Software, 6:67 (2021), 3773 <ext-link ext-link-type='doi' href='https://doi.org/10.21105/joss.03773'>10.21105/joss.03773</ext-link>

[18] G. M. Amdahl, “Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities”, AFIPS Conference Proceedings, 30 (1967), 483–485 <ext-link ext-link-type='doi' href='https://doi.org/10.1145/1465482'>10.1145/1465482</ext-link>

[19] F. Baumdicker, G. Bisschop, D. Goldstein, G. Gower, A. P. Ragsdale, G. Tsambos, S. Zhu, B. Eldon, E. C. Ellerman, J. G. Galloway et al, “Efficient ancestry and mutation simulation with msprime 1.0”, Genetics, 220:3 (2022), iyab229 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/genetics/iyab229'>10.1093/genetics/iyab229</ext-link>

[20] S. J. Spielman, C. O. Wilke, “Pyvolve: a flexible Python module for simulating sequences along phylogenies”, PloS One, 10:9 (2015), e0139047 <ext-link ext-link-type='doi' href='https://doi.org/10.1371/journal.pone.0139047'>10.1371/journal.pone.0139047</ext-link>

[21] M. F. Boni, G. J.D. Smith, C. E. Holmes, D. Vijaykrishna, “No evidence for intra-segment recombination of 2009 H1N1 influenza virus in swine”, Gene, 494:2 (2012), 242–245 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/j.gene.2011.10.041'>10.1016/j.gene.2011.10.041</ext-link>