Performance analysis of cross-assembly of metatranscriptomic datasets in viral community studies
Matematičeskaâ biologiâ i bioinformatika, Tome 18 (2023) no. 2, pp. 418-433.

Voir la notice de l'article provenant de la source Math-Net.Ru

We conducted a comparative analysis of individual and cross-assemblies of several metatranscriptomic data sets to study viral communities using several metatranscriptomes of endemic Baikal mollusks. We have shown that, compared to individual dataset assemblies, a hidden Markov model-based cross-assembly procedure increases the number of viral contigs (or scaffolds) per sample, the number of virotypes identified, and the average length of scaffolds per sample. The proportion of assembled viral reads from the total number of reads in samples is higher in cross-assembly. De novo cross-genomic assemblies combined with a virus identification algorithm using HMM present the data in a table with the number of reads from different samples for each scaffold. The table allows comparison of samples based on the representation of all viral scaffolds, including those not taxonomically identified, i.e. those that have no analogues in the NCBI RefSeq database. Thus, cross-genomic assemblies allow for comparative analyzes taking into account the latent diversity of viruses. We propose a pipeline for metatranscriptomic data analysis using de novo cross-genomic assembly to study viral diversity.
@article{MBB_2023_18_2_a4,
     author = {Yu. S. Bukin and A. N. Bondaryuk and T. V. Butina},
     title = {Performance analysis of cross-assembly of metatranscriptomic datasets in viral community studies},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {418--433},
     publisher = {mathdoc},
     volume = {18},
     number = {2},
     year = {2023},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MBB_2023_18_2_a4/}
}
TY  - JOUR
AU  - Yu. S. Bukin
AU  - A. N. Bondaryuk
AU  - T. V. Butina
TI  - Performance analysis of cross-assembly of metatranscriptomic datasets in viral community studies
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2023
SP  - 418
EP  - 433
VL  - 18
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2023_18_2_a4/
LA  - ru
ID  - MBB_2023_18_2_a4
ER  - 
%0 Journal Article
%A Yu. S. Bukin
%A A. N. Bondaryuk
%A T. V. Butina
%T Performance analysis of cross-assembly of metatranscriptomic datasets in viral community studies
%J Matematičeskaâ biologiâ i bioinformatika
%D 2023
%P 418-433
%V 18
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2023_18_2_a4/
%G ru
%F MBB_2023_18_2_a4
Yu. S. Bukin; A. N. Bondaryuk; T. V. Butina. Performance analysis of cross-assembly of metatranscriptomic datasets in viral community studies. Matematičeskaâ biologiâ i bioinformatika, Tome 18 (2023) no. 2, pp. 418-433. http://geodesic.mathdoc.fr/item/MBB_2023_18_2_a4/

[1] L. Zhang, F. X. Chen, Z. Zeng, M. Xu, F. Sun, L. Yang, X. Bi, Y. Lin, Y. J. Gao, H. X. Hao et al, “Advances in Metagenomics and Its Application in Environmental Microorganisms”, Front. Microbiol, 12 (2021), 1–15 | DOI | Zbl

[2] S. Roux, J. Matthijnssens, B. E. Dutilh, “Metagenomics in Virology”, Encyclopedia of Virology, eds. Bamford D.H., Zuckerman M., Academic Press, Cambridge, 2020, 133–140 | DOI

[3] P. Sommers, A. Chatterjee, A. Varsani, G. Trubl, “Integrating Viral Metagenomics into an Ecological Framework”, Annu. Rev. Virol, 8 (2021), 133–158 | DOI

[4] T. M. Santiago-Rodriguez, E. B. Hollister, “Potential Applications of Human Viral Metagenomics and Reference Materials: Considerations for Current and Future Viruses”, Appl. Environ. Microbiol, 86:22 (2020), 1–12 | DOI

[5] T. M. Santiago-Rodriguez, E. B. Hollister, “Unraveling the viral dark matter through viral metagenomics”, Front. Immunol, 13 (2022), 1–13 | DOI

[6] R. Leinonen, H. Sugawara, M. Shumway, “The sequence read archive”, Nucleic Acids Res, 39 (2011), 2010–2012 | DOI

[7] M. Shi, X. D. Lin, J. H. Tian, L. J. Chen, X. Chen, C. X. Li, X. C. Qin, J. Li, J. P. Cao, J. S. Eden et al., “Redefining the invertebrate RNA virosphere”, Nature, 540:7634 (2016), 539–543 | DOI

[8] Y. Y. Zhang, Y. Chen, X. Wei, J. Cui, “Viromes in marine ecosystems reveal remarkable invertebrate RNA virus diversity”, Sci. China Life Sci, 65:2 (2022), 426–437 | DOI

[9] T. Thomas, J. Gilbert, F. Meyer, “Metagenomics a guide from sampling to data analysis”, Microb. Inform. Exp., 2:1 (2012), 3 | DOI | MR

[10] S. Nooij, D. Schmitz, H. Vennema, A. Kroneman, M. P.G. Koopmans, “Overview of virus metagenomic classification methods and their biological applications”, Front. Microbiol., 9 (2018), 749 | DOI

[11] T. D.S. Sutton, A. G. Clooney, F. J. Ryan, R. P. Ross, C. Hill, “Choice of assembly software has a critical impact on virome characterisation”, Microbiome, 7:1 (2019), 1–15 | DOI

[12] S. Hiltemann, H. Rasche, S. Gladman, H. R. Hotz, D. Lariviere, D. Blankenberg, P. D. Jagtap, T. Wollmann, A. Bretaudeau, N. Goue et al, Galaxy Training: A powerful framework for teaching!, PLoS Comput. Biol., 19:1 (2023), 1–18 | DOI

[13] P. Skewes-Cox, T. J. Sharpton, K. S. Pollard, J. L. DeRisi, “Profile hidden Markov models for the detection of viruses within metagenomic sequence data”, PLoS One, 9:8 (2014), e105067 | DOI

[14] J. Ren, K. Song, C. Deng, N. A. Ahlgren, J. A. Fuhrman, Y. Li, X. Xie, R. Poplin, F. Sun, “Identifying viruses from metagenomic data using deep learning”, Quant. Biol., 8:1 (2020), 64–77 | DOI

[15] A. P. Reyes, J. M. Alves, A. M. Durham, A. Gruber, “Use of profile hidden Markov models in viral discovery: current insights”, Adv. Genomics Genet., 7 (2017), 29–45 | DOI

[16] T. V. Butina, Y. S. Bukin, I. S. Petrushin, A. E. Tupikin, M. R. Kabilov, S. I. Belikov, “Extended evaluation of viral diversity in Lake Baikal through metagenomics”, Microorganisms, 9:4 (2021), 1–31 | DOI

[17] T. V. Butina, I. S. Petrushin, I. V. Khanaev, Y. S. Bukin, “Metagenomic Assessment of DNA Viral Diversity in Freshwater Sponges”, Baikalospongia bacillifera. Microorganisms, 10:2 (2022), 480 | DOI

[18] T. V. Butina, I. V. Khanaev, I. S. Petrushin, A. N. Bondaryuk, O. O. Maikova, Y. S. Bukin, “The RNA Viruses in Samples of Endemic Lake Baikal Sponges”, Diversity, 15:7 (2023), 1–20 | DOI

[19] A. M. Bolger, M. Lohse, B. Usadel, “Trimmomatic: A flexible trimmer for Illumina sequence data”, Bioinformatics, 30:15 (2014), 2114–2120 | DOI

[20] S. Nurk, D. Meleshko, A. Korobeynikov, P. A. Pevzner, “MetaSPAdes: A new versatile metagenomic assembler”, Genome Res, 27:5 (2017), 824–834 | DOI

[21] J. Guo, B. Bolduc, A. A. Zayed, A. Varsani, G. Dominguez-Huerta, T. O. Delmont, A. A. Pratama, M. C. Gazitua, D. Vik, M. B. Sullivan et al, “VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses”, Microbiome, 9:1 (2021), 1–13 | DOI | MR | Zbl

[22] S. Nayfach, A. P. Camargo, F. Schulz, E. Eloe-Fadrosh, S. Roux, N. C. Kyrpides, “CheckV assesses the quality and completeness of metagenome-assembled viral genomes”, Nat. Biotechnol., 39:5 (2021), 578–585 | DOI

[23] B. Buchfink, C. Xie, D. H. Huson, “Fast and sensitive protein alignment using DIAMOND”, Nat. Methods, 12:1 (2014), 59–60 | DOI

[24] T. J. Wheeler, J. Clements, S. R. Eddy, R. Hubley, T. A. Jones, J. Jurka, A. F.A. Smit, R. D. Finn, “Dfam: A database of repetitive DNA based on profile hidden Markov models”, Nucleic Acids Res, 41 (2013), 70–82 | DOI

[25] Dfam release 3.7, January 2023 (accessed 02.11.2023) https://www.dfam.org/

[26] N. A. O'Leary, M. W. Wright, J. R. Brister, S. Ciufo, D. Haddad, R. McVeigh, B. Rajput, B. Robbertse, B. Smith-White, D. Ako-Adjei et al, “Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation”, Nucleic Acids Res., 44 (2016), D733–D745 | DOI

[27] B. Langmead, S. L. Salzberg, “Fast gapped-read alignment with Bowtie 2”, Nat. Methods, 9:4 (2012), 357–359 | DOI

[28] P. Danecek, J. K. Bonfield, J. Liddle, J. Marshall, V. Ohan, M. O. Pollard, A. Whitwham, T. Keane, S. A. McCarthy, R. M. Davies, “Twelve years of SAMtools and BCFtools”, Gigascience, 10:2 (2021), 1–4 | DOI

[29] Oksanen J., Package 'vegan', (accessed 03.11.2023) https://github.com/vegandevs/vegan

[30] D. Li, C. M. Liu, R. Luo, K. Sadakane, T. W. Lam, “MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph”, Bioinformatics, 31:10 (2015), 1674–1676 | DOI

[31] Y. Peng, H. C.M. Leung, S. M. Yiu, F. Y.L. Chin, “IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth”, Bioinformatics, 28:11 (2012), 1420–1428 | DOI

[32] C. Yang, D. Chowdhury, Z. Zhang, W. K. Cheung, A. Lu, Z. Bian, L. Zhang, “A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data”, Comput. Struct. Biotechnol. J., 19 (2021), 6301–6314 | DOI

[33] S. Petrovskii, N. Petrovskaya, “Computational ecology as an emerging science”, Interface Focus, 2:2 (2012), 241–254 | DOI

[34] K. Kieft, Z. Zhou, K. Anantharaman, “VIBRANT: Automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences”, Microbiome, 8 (2020), 1–23 | DOI

[35] A. Moya, S. F. Elena, A. Bracho, R. Miralles, E. Barrio, “The evolution of RNA viruses: A population genetics view”, Proc. Natl. Acad. Sci. U.S.A., 24:13 (2000), 6967–6973 | DOI

[36] A. N. Bondaryuk, N. V. Kulakova, O. I. Belykh, Y. S. Bukin, “Dates and Rates of Tick Borne Encephalitis Virus-The Slowest Changing Tick-Borne Flavivirus”, Int. J. Mol. Sci., 24:3 (2023), 2921 | DOI

[37] D. D. Kang, J. Froula, R. Egan, Wang Z. MetaBAT, “an efficient tool for accurately reconstructing single genomes from complex microbial communities”, PeerJ., 3 (2015), e1165 | DOI

[38] Y. W. Wu, B. A. Simmons, S. W. Singer, “MaxBin 2.0: An automated binning algorithm to recover genomes from multiple metagenomic datasets”, Bioinformatics, 32:4 (2016), 605–607 | DOI

[39] J. Tamames, F. Puente-Sanchez, “SqueezeMeta, a highly portable, fully automatic metagenomic analysis pipeline”, Front Microbiol., 9 (2019), 3349 | DOI

[40] K. Rosario, M. Breitbart, “Exploring the viral world through metagenomics”, Curr. Opin. Virol., 1:4 (2011), 289–297 | DOI

[41] B. M. Gudenkauf, I. Hewson, “Comparative metagenomics of viral assemblages inhabiting four phyla of marine invertebrates”, Front. Mar. Sci., 3 (2016), 1–12 | DOI

[42] F. M. Waldron, G. N. Stone, D. J. Obbard, “Metagenomic sequencing suggests a diversity of RNA interference-like responses to viruses across multicellular eukaryotes”, PLoS Genet., 14:7 (2018), e1007533 | DOI

[43] G. H. Bai, S. C. Lin, Y. H. Hsu, S. Y. Chen, “The Human Virome: Viral Metagenomics, Relations with Human Diseases, and Therapeutic Applications”, Viruses, 14 (2022), 278 | DOI

[44] J. C. Richard, E. Blevins, C. D. Dunn, E. M. Leis, T. L. Goldberg, “Viruses of Freshwater Mussels during Mass Mortality Events in Oregon and Washington, USA”, Viruses, 15:8 (2023), 1–18 | DOI

[45] W. Li, A. Godzik, “Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences”, Bioinformatics, 22:13 (2006), 1658–1659 | DOI