New procedure of raw Illumina MiSeq data filtering for the amplicon metagenomic libraries
Matematičeskaâ biologiâ i bioinformatika, Tome 13 (2018) no. 1, pp. 159-168.

Voir la notice de l'article provenant de la source Math-Net.Ru

In this paper we present an algorithm to filter amplicon paired-end NGS raw data which is used to capture genetic and taxonomic diversity of communities of unicellular microorganisms. The suggested approach allows one to overcome the issue of massive data loss during filtration of raw sequences and increases the static representativeness of analyzed amplicons. Furthermore, an unequal elimination of sequences belonging to different taxonomic groups was shown to occur if one applies standard trimming methods based on filtration of quality of raw reads, for instance, using sliding window approach. This bias may result in a skew of taxon counts and depletion of taxonomic diversity of analyzed communities. The suggested method does not introduce the errors of this kind. The implementation of the algorithm in R as well as a number of example files for analysis is available at https://github.com/barnsys/metagenomic_analysis.
@article{MBB_2018_13_1_a2,
     author = {Yu. S. Bukin and L. S. Buzoleva and Yu. S. Golozubova and Yu. P. Galachyants},
     title = {New procedure of raw {Illumina} {MiSeq} data filtering for the amplicon metagenomic libraries},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {159--168},
     publisher = {mathdoc},
     volume = {13},
     number = {1},
     year = {2018},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/MBB_2018_13_1_a2/}
}
TY  - JOUR
AU  - Yu. S. Bukin
AU  - L. S. Buzoleva
AU  - Yu. S. Golozubova
AU  - Yu. P. Galachyants
TI  - New procedure of raw Illumina MiSeq data filtering for the amplicon metagenomic libraries
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2018
SP  - 159
EP  - 168
VL  - 13
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2018_13_1_a2/
LA  - en
ID  - MBB_2018_13_1_a2
ER  - 
%0 Journal Article
%A Yu. S. Bukin
%A L. S. Buzoleva
%A Yu. S. Golozubova
%A Yu. P. Galachyants
%T New procedure of raw Illumina MiSeq data filtering for the amplicon metagenomic libraries
%J Matematičeskaâ biologiâ i bioinformatika
%D 2018
%P 159-168
%V 13
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2018_13_1_a2/
%G en
%F MBB_2018_13_1_a2
Yu. S. Bukin; L. S. Buzoleva; Yu. S. Golozubova; Yu. P. Galachyants. New procedure of raw Illumina MiSeq data filtering for the amplicon metagenomic libraries. Matematičeskaâ biologiâ i bioinformatika, Tome 13 (2018) no. 1, pp. 159-168. http://geodesic.mathdoc.fr/item/MBB_2018_13_1_a2/

[1] Bolger A. M., Lohse M., Usadel B., “Trimmomatic: a flexible trimmer for Illumina sequence data”, Bioinformatics, 30:15 (2014), 2114–2120 | DOI

[2] Chun J., Kim K. Y., Lee J. H., Choi Y., “The analysis of oral microbial communities of wild-type and toll-like receptor 2-deficient mice using a 454 GS FLX Titanium pyrosequencer”, BMC Microbiology, 10:1 (2010), 101 | DOI

[3] Dixon P., “VEGAN, a package of R functions for community ecology”, Journal of Vegetation Science, 14:6 (2003), 927–930 | DOI

[4] Fosso B., Santamaria M., Marzano M., Alonso-Alemany D., Valiente G., Donvito G., Monaco A., Notarangelo P., Pesole G., “BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS”, BMC Bioinformatics, 16:1 (2015), 203 | DOI

[5] Katoh K., Toh H., “Parallelization of the MAFFT multiple sequence alignment program”, Bioinformatics, 26:15 (2010), 1899–1900 | DOI

[6] Kim M., Lee K. H., Yoon S. W., Kim B. S., Chun J., Yi H., “Analytical tools and databases for metagenomics in the next-generation sequencing era”, Genomics Informatics, 11:3 (2013), 102–113 | DOI

[7] Magoc T., Salzberg S. L., “FLASH: fast length adjustment of short reads to improve genome assemblies”, Bioinformatics, 27:21 (2011), 2957–2963 | DOI

[8] Morgan M., Anders S., Lawrence M., Aboyoun P., Pages H., Gentleman R., “ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data”, Bioinformatics, 25:19 (2009), 2607–2608 | DOI

[9] Petrosino J. F., Highlander S., Luna R. A., Gibbs R. A., Versalovic J., “Metagenomic pyrosequencing and microbial identification”, Clinical Chemistry, 55:5 (2009), 856–866 | DOI

[10] Quail M. A., Smith M., Coupland P., Otto T. D., Harris S. R., Connor T. R., Bertoni A., Swerdlow H. P., Gu Y., “A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers”, BMC Genomics, 13:1 (2012), 341 | DOI

[11] Quast C., Pruesse E., Yilmaz P., Gerken J., Schweer T., Yarza P., Peplies J., Glöckner F. O., “The SILVA ribosomal RNA gene database project: improved data processing and web-based tools”, Nucleic Acids Research, 41:D1 (2013), D590–D596 | DOI | MR

[12] Schloss P. D., Westcott S. L., Ryabin T., Hall J. R., Hartmann M., Hollister E. B., Lesniewski R. A., Oakley B. B., Parks D. H., Robinson C. J., Sahl J. W., Stres B., Thallinger G. G., Van Horn D. J., Weber C. F., “Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities”, Applied and Environmental Microbiology, 75:23 (2009), 7537–7541 | DOI

[13] Smith E. P., van Belle G., “Nonparametric estimation of species richness”, Biometrics, 40:1 (1984), 119–129 | DOI

[14] Tennant R. K., Sambles C. M., Diffey G. E., Moore K. A., Love J., “Metagenomic Analysis of Silage”, Journal of Visualized Experiments: JoVE, 119 (2017) | DOI

[15] Zhou J., Wu L., Deng Y., Zhi X., Jiang Y. H., Tu Q., Yang Y., “Reproducibility and quantitation of amplicon sequencing-based detection”, The ISME Journal, 5:8 (2011), 1303–1313 | DOI