Big Data in bioinformatics

N. N. Nazipova; E. A. Isaev; V. V. Kornilov; D. V. Pervukhin; A. A. Morozova; A. A. Gorbunov; M. N. Ustinin

Geodesic

Parcourir par

Big Data in bioinformatics

N. N. Nazipova ; E. A. Isaev ; V. V. Kornilov ; D. V. Pervukhin ; A. A. Morozova ; A. A. Gorbunov ; M. N. Ustinin

Matematičeskaâ biologiâ i bioinformatika, Tome 13 (2018), pp. t1-t16.

Voir la notice de l'article provenant de la source Math-Net.Ru

Résumé

Sequencing of the human genome began in 1994. Revealing of a human DNA draft took 10 years of collaborative work of many research groups from different countries. Modern technologies allow for sequencing a whole genome in a few days. We discuss here the advances in modern bioinformatics related to the emergence of highperformance sequencing platforms, which not only contributed to the expansion of capabilities of biology and related sciences, but also gave rise to the phenomenon of Big Data in biology. The necessity for development of new technologies and methods for organization of storage, management, analysis and visualization of big data is substantiated. Modern bioinformatics is facing not only the problem of processing enormous volumes of heterogeneous data, but also a variety of methods of interpretation and presentation of the results, the simultaneous existence of various software tools and data formats. The ways of solving the arising challenges are discussed, in particular by using experiences from other areas of modern life, such as web and business intelligence. The former is the area of scientific research and development that explores the impact and makes use of artificial intelligence and information technology (IT) for new products, services and frameworks that are empowered by the World Wide Web; the latter is the domain of IT, which addresses the issues of decision-making. New database management systems, other than relational ones, will help to solve the problem of storing huge data and providing an acceptable timescale for performing search queries. New programming technologies, such as generic programming and visual programming, are designed to solve the problem of the diversity of genomic data formats and to provide the ability to quickly create one’s own scripts for data processing.

Export
Comment citer

@article{MBB_2018_13_a0,
     author = {N. N. Nazipova and E. A. Isaev and V. V. Kornilov and D. V. Pervukhin and A. A. Morozova and A. A. Gorbunov and M. N. Ustinin},
     title = {Big {Data} in bioinformatics},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {t1--t16},
     publisher = {mathdoc},
     volume = {13},
     year = {2018},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/MBB_2018_13_a0/}
}

TY  - JOUR
AU  - N. N. Nazipova
AU  - E. A. Isaev
AU  - V. V. Kornilov
AU  - D. V. Pervukhin
AU  - A. A. Morozova
AU  - A. A. Gorbunov
AU  - M. N. Ustinin
TI  - Big Data in bioinformatics
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2018
SP  - t1
EP  - t16
VL  - 13
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2018_13_a0/
LA  - en
ID  - MBB_2018_13_a0
ER  -

%0 Journal Article
%A N. N. Nazipova
%A E. A. Isaev
%A V. V. Kornilov
%A D. V. Pervukhin
%A A. A. Morozova
%A A. A. Gorbunov
%A M. N. Ustinin
%T Big Data in bioinformatics
%J Matematičeskaâ biologiâ i bioinformatika
%D 2018
%P t1-t16
%V 13
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2018_13_a0/
%G en
%F MBB_2018_13_a0

N. N. Nazipova; E. A. Isaev; V. V. Kornilov; D. V. Pervukhin; A. A. Morozova; A. A. Gorbunov; M. N. Ustinin. Big Data in bioinformatics. Matematičeskaâ biologiâ i bioinformatika, Tome 13 (2018), pp. t1-t16. http://geodesic.mathdoc.fr/item/MBB_2018_13_a0/

Bibliographie
Cité par

[1] Manyika J., Chui M., Brown B., Bughin J., Dobbs R., Roxburgh C., Byers A. H., The Next Frontier for Innovation, Competition, and Productivity, McKinsey Global Institute, San Francisco, 2011 (accessed 17.02.2017) <ext-link ext-link-type='uri' href='http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation'>http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation</ext-link>

[2] Jacobs A., “The Pathologies of Big Data”, Communications of the ACM, 52:8 (2009) <ext-link ext-link-type='doi' href='http://dx.doi.org/10.1145/1536616.1536632'>http://dx.doi.org/10.1145/1536616.1536632</ext-link>

[3] What's New in Gartner's Hype Cycle for Emerging Technologies, , Gartner, 2015 (accessed 17.02.2017) <ext-link ext-link-type='uri' href='http://www.gartner.com/smarterwithgartner/whats-new-in-gartners-hype-cycle-for-emerging-technologies-2015/'>http://www.gartner.com/smarterwithgartner/whats-new-in-gartners-hype-cycle-for-emerging-technologies-2015/</ext-link>

[4] Chui M., Loffler M., Roberts R., The Internet of Things, , McKinsey Quarterly, 2010 (accessed 17.02.2017) <ext-link ext-link-type='uri' href='http://www.mckinsey.com/industries/high-tech/our-insights/the-internet-of-things'>http://www.mckinsey.com/industries/high-tech/our-insights/the-internet-of-things</ext-link>

[5] Hogeweg P., “The Roots of Bioinformatics in Theoretical Biology”, PLOS Computational Biology, 7:3 (2011), e1002021 <ext-link ext-link-type='doi' href='https://doi.org/10.1371/journal.pcbi.1002021'>10.1371/journal.pcbi.1002021</ext-link>

[6] Winkler H., Verbreitung und Ursache der Parthenogenesis im Pflanzen - und Tierreiche, Verlag Fischer, Jena, 1920

[7] Baker M., “The 'Oms Puzzle”, Nature, 494 (2013), 416–419 <ext-link ext-link-type='doi' href='https://doi.org/10.1038/494416a'>10.1038/494416a</ext-link>

[8] Ohashi H., Hesegawa M., Wakimoto K., Miyamoto-Sato E., “Next-generation technologies for multiomics approaches including interactome sequencing”, BioMed Research International, 2015 (2015), 104209 <ext-link ext-link-type='doi' href='https://doi.org/10.1155/2015/104209'>10.1155/2015/104209</ext-link>

[9] “International Human Genome Sequencing Consortium. Human genome”, Nature, 409 (2001), 860–921 <ext-link ext-link-type='doi' href='https://doi.org/10.1038/35057062'>10.1038/35057062</ext-link>

[10] Venter J. C., Adams M. D., Myers E. W., Li P. W., Mural R. J., Sutton G. G., Smith H. O., Yandell M., Evans C. A., Holt R. A., et al., “The sequence of the human genome”, Science, 291:5507 (2001), 1304–1351 <ext-link ext-link-type='doi' href='https://doi.org/10.1126/science.1058040'>10.1126/science.1058040</ext-link>

[11] Buermans H. P. J., den Dunnen J. T., “Next generation sequencing technology. Advances and applications”, BBA — Molecular Basis of Disease, 1842:10 (2014), 1932–1941 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/j.bbadis.2014.06.015'>10.1016/j.bbadis.2014.06.015</ext-link>

[12] Bioinforx Inc. Next Generation Sequencing Software, (accessed 17.02.2017) <ext-link ext-link-type='uri' href='http://bioinfo.wisc.edu/knowledge_base/next-gen-seq_software.php'>http://bioinfo.wisc.edu/knowledge_base/next-gen-seq_software.php</ext-link>

[13] BaseSpace Sequence Hub, (accessed 17.02.2017) <ext-link ext-link-type='uri' href='https://www.illumina.com/content/dam/illumina-marketing/documents/products/datasheets/datasheet_basespace.pdf'>https://www.illumina.com/content/dam/illumina-marketing/documents/products/datasheets/datasheet_basespace.pdf</ext-link>

[14] CLCBio, (accessed 17.02.2017) <ext-link ext-link-type='uri' href='http://www.clcbio.com'>http://www.clcbio.com</ext-link>

[15] DNASTAR Lasergene, (accessed 17.02.2017) <ext-link ext-link-type='uri' href='https://www.dnastar.com/t-allproducts.aspx'>https://www.dnastar.com/t-allproducts.aspx</ext-link>

[16] Kearse M., Moir R., Wilson A., Stones-Havas S., Cheung M., Sturrock S., Buxton S., Cooper A., Markowitz S., Duran C., et al., “Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data”, Bioinformatics, 28:12 (2012), 1647–1649 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/bts199'>10.1093/bioinformatics/bts199</ext-link>

[17] Giardine B., Riemer C., Hardison R. C., Burhans R., Elnitski L., Shah P., Zhang Y., Blankenberg D., Albert I., Taylor J., et al., “Galaxy: a platform for interactive large-scale genome analysis”, Genome Res., 15:10 (2005), 1451–1455 <ext-link ext-link-type='doi' href='https://doi.org/10.1101/gr.4086505'>10.1101/gr.4086505</ext-link>

[18] Goecks J., Nekrutenko A., Taylor J., “Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences”, Genome Biol., 11:8 (2010), R86 <ext-link ext-link-type='doi' href='https://doi.org/10.1186/gb-2010-11-8-r86'>10.1186/gb-2010-11-8-r86</ext-link>

[19] Madduri R. K., Sulakhe D., Lacinski L., Liu B., Rodriguez A., Chard K., Dave U. J., Foster I. T., “Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services”, Concurr. Comput., 26:13 (2014), 2266–2279 <ext-link ext-link-type='doi' href='https://doi.org/10.1002/cpe.3274'>10.1002/cpe.3274</ext-link>

[20] Wattam A. R., Abraham D., Dalay O., Disz T. L., Driscoll T., Gabbard J. L., Gillespie J. J., Gough R., Hix D., Kenyon R., et al., “PATRIC, the bacterial bioinformatics database and analysis resource”, Nucleic Acids Res., 42 (2014), D581–D591 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/gkt1099'>10.1093/nar/gkt1099</ext-link>

[21] Golosova O., Henderson R., Vaskin Y., Gabrielian A., Grekhov G., Nagarajan V., Oler A. J., Quinones M., Hurt D., Fursov M., Huyen Y., “Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses”, PeerJ., 2 (2014), e644 <ext-link ext-link-type='doi' href='https://doi.org/10.7717/peerj.644'>10.7717/peerj.644</ext-link>

[22] Okonechnikov K., Golosova O., Fursov M., “UGENE Team. Unipro UGENE: a unified bioinformatics toolkit”, Bioinformatics, 28:8 (2012), 1166–1167 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/bts091'>10.1093/bioinformatics/bts091</ext-link>

[23] Jagla B., Wiswedel B., Coppree J.-Y., “Extending KNIME for next-generation sequencing data analysis”, Bioinformatics, 27:20 (2011), 2907–2909 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btr478'>10.1093/bioinformatics/btr478</ext-link>

[24] Warr W. A., “Scientific workflow systems: Pipeline Pilot and KNIME”, Journal of Computer-Aided Molecular Design, 26:7 (2012), 801–804 <ext-link ext-link-type='doi' href='https://doi.org/10.1007/s10822-012-9577-7'>10.1007/s10822-012-9577-7</ext-link>

[25] Oinn T., Addis M., Ferris J., Marvin D., Senger M., Greenwood M., Carver T., Glover K., Pocock M. R., Wipat A., Li P., “Taverna: a tool for the composition and enactment of bioinformatics workflows”, Bioinformatics, 20:17 (2004), 3045–3054 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/bth361'>10.1093/bioinformatics/bth361</ext-link>

[26] Barnett D. W., Garrison E. K., Quinlan A. R., Stromberg M. P., Marth G. T., “BamTools: a C++ API and toolkit for analyzing and managing BAM files”, Bioinformatics, 27:12 (2011), 1691–1692 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btr174'>10.1093/bioinformatics/btr174</ext-link>

[27] Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., “1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools”, Bioinformatics, 25:16 (2009), 2078–2079 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btp352'>10.1093/bioinformatics/btp352</ext-link>

[28] Nordell Markovits A., Joly Beauparlant C., Toupin D., Wang S., Droit A., Gevry N., “NGS++: a library for rapid prototyping of epigenomics software tools”, Bioinformatics, 29:15 (2013), 1893–1894 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btt312'>10.1093/bioinformatics/btt312</ext-link>

[29] Plieskatt J., Rinaldi G., Brindley P. J., Jia X., Potriquet J., Bethony J., Mulvenna J., “Bioclojure: a functional library for the manipulation of biological sequences”, Bioinformatics, 30:17 (2014), 2537–2539 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btu311'>10.1093/bioinformatics/btu311</ext-link>

[30] libStatGen, (accessed 17.02.2017) <ext-link ext-link-type='uri' href='https://github.com/statgen/libStatGen/'>https://github.com/statgen/libStatGen/</ext-link>

[31] Pitt W. R., Williams M. A., Steven M., Sweeney B., Bleasby A. J., Moss D. S., “The Bioinformatics Template Library — generic components for biocomputing”, Bioinformatics, 17:8 (2001), 729–737 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/17.8.729'>10.1093/bioinformatics/17.8.729</ext-link>

[32] Stajich J. E., Block D., Boulez K., Brenner S. E., Chervitz S. A., Dagdigian C., Fuellen G., Gilbert J. G., Korf I., Lapp H., et al., “The Bioperl toolkit: Perl modules for the life sciences”, Genome Res., 12:10 (2002), 1611–1618 <ext-link ext-link-type='doi' href='https://doi.org/10.1101/gr.361602'>10.1101/gr.361602</ext-link>

[33] Goto N., Prins P., Nakao M., Bonnal R., Aerts J., Katayama T., “BioRuby: bioinformatics software for the Ruby programming language”, Bioinformatics, 26:20 (2010), 2617–269 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btq475'>10.1093/bioinformatics/btq475</ext-link>

[34] Holland R. C., Down T. A., Pocock M., Prlic A., Huen D., James K., Foisy S., Drager A., Yates A., Heuer M., et al., “BioJava: an open-source framework for bioinformatics”, Bioinformatics, 24:18 (2008), 2096–2097 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btn397'>10.1093/bioinformatics/btn397</ext-link>

[35] Cock P. J., Antao T., Chang J. T., Chapman B. A., Cox C. J., Dalke A., Friedberg I., Hamelryck T., Kauff F., Wilczynski B., et al., “Biopython: freely available Python tools for computational molecular biology and bioinformatics”, Bioinformatics, 25:11 (2009), 1422–1423 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btp163'>10.1093/bioinformatics/btp163</ext-link>

[36] Open Bioinformatics Foundation, (accessed 17.02.2017) <ext-link ext-link-type='uri' href='https://www.open-bio.org/wiki/Main_Page'>https://www.open-bio.org/wiki/Main_Page</ext-link>

[37] Huber W., Carey V. J., Gentleman R., Anders S., Carlson M., Carvalho B. S., Bravo H. C., Davis S., Gatto L., Girke T., et al., “Orchestrating high-throughput genomic analysis with Bioconductor”, Nat. Methods, 12:2 (2015), 115–121 <ext-link ext-link-type='doi' href='https://doi.org/10.1038/nmeth.3252'>10.1038/nmeth.3252</ext-link>

[38] Gentleman R. C., Carey V. J., Bates D. M., Bolstad B., Dettling M., Dudoit S., Ellis B., Gautier L., Ge Y., Gentry J., et al., “Bioconductor: open software development for computational biology and bioinformatics”, Genome Biol., 5:10 (2004), R80 <ext-link ext-link-type='doi' href='https://doi.org/10.1186/gb-2004-5-10-r80'>10.1186/gb-2004-5-10-r80</ext-link>

[39] Milicchio F., Rose R., Bian J., Min J., Prosperi M., “Visual programming for next-generation data analytics”, BioData Mining, 9 (2016), 16 <ext-link ext-link-type='doi' href='https://doi.org/10.1186/s13040-016-0095-3'>10.1186/s13040-016-0095-3</ext-link>

[40] Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F. Jr., Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M., “The Protein Data Bank: a computer-based archival file for macromolecular structures”, J. Mol. Biol., 112:3 (1977), 535–542 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/S0022-2836(77)80200-3'>10.1016/S0022-2836(77)80200-3</ext-link>

[41] Bourne P. E., Berman H. M., McMahon B., Watenpaugh K. D., Westbrook J. D., Fitzgerald P. M. D., “Macromolecular crystallographic information file”, Methods in Enzymology, 277 (1997), 571–590 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/S0076-6879(97)77032-0'>10.1016/S0076-6879(97)77032-0</ext-link>

[42] Bradley A. R., Rose A. S., Pavelka A., Valasatava Y., Duarte J. M., Prlić A., Rose P. W., “MMTF — an efficient file format for the transmission, visualization, and analysis of macromolecular structures”, PLOS Computational Biology, 13:6 (2017), e1005575 <ext-link ext-link-type='doi' href='https://doi.org/10.1371/journal.pcbi.1005575'>10.1371/journal.pcbi.1005575</ext-link>

[43] Galperin M. Y., Fernandez-Suarez X. M., Rigden D. J., “The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes”, Nucleic Acids Res., 45 (2017), D1–D11 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/gkw1188'>10.1093/nar/gkw1188</ext-link>

[44] Benson D., Lipman D. J., Ostell J., “GenBank”, Nucleic Acids Res., 22 (1994), 3441–3444 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/22.17.3441'>10.1093/nar/22.17.3441</ext-link>

[45] Rice C. M., Fuchs R., Higgins D. G., Stoehr P. J., Cameron G. N., “The EMBL Data Library”, Nucleic Acids Res., 21 (1993), 2967–2971 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/21.13.2967'>10.1093/nar/21.13.2967</ext-link>

[46] Tateno Y., Gojobori T., “DNA Data Bank of Japan in the age of information biology”, Nucleic Acids Res., 25:1 (1997), 14–17 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/25.1.14'>10.1093/nar/25.1.14</ext-link>

[47] de Brevern A. G., Meyniel J.-P., Fairhead C., Neuveglise C., Malpertuy A., “Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies”, BioMed Research International, 2015, 904541

[48] Lith A., Mattsson J., Investigating Storage Solutions for Large Data. A comparison of well performing and scalable data storage solutions for real time extraction and batch insertion of data, Master of Science Thesis, 2010 (accessed 17.02.2017) <ext-link ext-link-type='uri' href='http://publications.lib.chalmers.se/records/fulltext/123839.pdf'>http://publications.lib.chalmers.se/records/fulltext/123839.pdf</ext-link><ext-link ext-link-type='zbl-item-id' href='https://zbmath.org/?q=an:1219.92023'>1219.92023</ext-link>

[49] Svensson J., Relational vs. graph databases: Which to use and when?, SD Times, 2016 (accessed 17.02.2017) <ext-link ext-link-type='uri' href='http://sdtimes.com/guest-view-relational-vs-graph-databases-use/#sthash.yHI6aoDv.dpuf'>http://sdtimes.com/guest-view-relational-vs-graph-databases-use/#sthash.yHI6aoDv.dpuf</ext-link>

[50] Have C. T., Jensen L. J., Are graph databases ready for bioinformatics?, Bioinformatics, 29:24 (2013), 3107–3108 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btt549'>10.1093/bioinformatics/btt549</ext-link>

[51] Taylor R. C., “An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics”, BMC Bioinformatics, 11 (2010), S1 <ext-link ext-link-type='doi' href='https://doi.org/10.1186/1471-2105-11-S12-S1'>10.1186/1471-2105-11-S12-S1</ext-link>

[52] Chang F., Dean J., Ghemawat S., Hsieh W. C., Wallach D. A., Burrows M., Chandra T., Fikes A., Gruber R. E., “Bigtable: A Distributed Storage System For Structured Data”, The 7th Symposium on Operating System Design and Implementation, Usenix Association, Seattle, WA, 2006, 14 pp. (accessed 17.02.2017) <ext-link ext-link-type='uri' href='https://static.googleusercontent.com/media/research.google.com/ru//archive/bigtable-osdi06.pdf'>https://static.googleusercontent.com/media/research.google.com/ru//archive/bigtable-osdi06.pdf</ext-link>

[53] Shen L., Shao N., Liu X., Nestler E., “Ngs.plot: quick mining and visualization of next-generation sequencing data by integrating genomic databases”, BMC Genomics, 15:1 (2014), 284 <ext-link ext-link-type='doi' href='https://doi.org/10.1186/1471-2164-15-284'>10.1186/1471-2164-15-284</ext-link>

[54] Robinson J. T., Thorvaldsdóttir H., Winckler W., Guttman M., Lander E. S., Getz G., Mesirov J. P., “Integrative genomics viewer”, Nature Biotechnology, 29:1 (2011), 24–26 <ext-link ext-link-type='doi' href='https://doi.org/10.1038/nbt.1754'>10.1038/nbt.1754</ext-link>

[55] Toedling J., Ciaudo C., Voinnet O., Heard E., Barillot E., “Girafe — an R/Bioconductor package for functional exploration of aligned next-generation sequencing reads”, Bioinformatics, 26:22 (2010), 2902–2903 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bioinformatics/btq531'>10.1093/bioinformatics/btq531</ext-link>

[56] Nolan D., Lang D. T., “Interactive and animated scalable vector graphics and R data displays”, Journal of Statistical Software, 46:1 (2012), 1–88 <ext-link ext-link-type='doi' href='https://doi.org/10.18637/jss.v046.i01'>10.18637/jss.v046.i01</ext-link>

[57] TIBCO Spotfire Homepage, (accessed 17.02.2017) <ext-link ext-link-type='uri' href='http://spotfire.tibco.com/'>http://spotfire.tibco.com/</ext-link>

[58] Wexler J., Thompson W., Aponte K., “Time Is Precious, So Are Your Models. SAS provides solutions to streamline deployment”, SAS Global Forum 2013, 086-2013 (accessed 17.02.2017) <ext-link ext-link-type='uri' href='https://support.sas.com/resources/papers/proceedings13/086-2013.pdf'>https://support.sas.com/resources/papers/proceedings13/086-2013.pdf</ext-link>

[59] Tanenbaum A. S., van Steen M., Distributed Systems. Principles and Paradigms, Prentice-Hall Inc., 2002 <ext-link ext-link-type='zbl-item-id' href='https://zbmath.org/?q=an:1157.68324'>1157.68324</ext-link>

[60] Dean J., Ghemawat S., “MapReduce: simplified data processing on large clusters”, Commun. ACM, 51:1 (2008), 107–113 <ext-link ext-link-type='doi' href='https://doi.org/10.1145/1327452.1327492'>10.1145/1327452.1327492</ext-link>

[61] White T., Hadoop: The Definitive Guide, O'Reilly Media, Inc., 2015, 756 pp.

[62] The Apache Software Foundation Home page, (accessed 17.02.2017) <ext-link ext-link-type='uri' href='http://www.apache.org/'>http://www.apache.org/</ext-link>

[63] IBM z Systems — z13s, (accessed 17.02.2017) <ext-link ext-link-type='uri' href='http://www-03.ibm.com/systems/z/hardware/z13s.html/'>http://www-03.ibm.com/systems/z/hardware/z13s.html/</ext-link>

[64] Rustici G., Kolesnikov N., Brandizi M., Burdett T., Dylag M., Emam I., Farne A., Hastings E., Ison J., Keays M., et al., “ArrayExpress update — trends in database growth and links to data analysis tools”, Nucleic Acids Res., 41 (2013), D987–D990 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/nar/gks1174'>10.1093/nar/gks1174</ext-link>

[65] Greene A. C., Giffin K. A., Greene C. S., Moore J. H., “Adapting bioinformatics curricula for big data”, Briefings in Bioinformatics, 17:1 (2016), 43–50 <ext-link ext-link-type='doi' href='https://doi.org/10.1093/bib/bbv018'>10.1093/bib/bbv018</ext-link>

[66] Margolis R., Derr L., Dunn M., Huerta M., Larkin J., Sheehan J., Guyer M., Green E. D., “The National Institutes of Health's Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data”, J. Am. Med. Inform. Assoc., 21 (2014), 957–958 <ext-link ext-link-type='doi' href='https://doi.org/10.1136/amiajnl-2014-002974'>10.1136/amiajnl-2014-002974</ext-link>

[67] Luo J., Wu M., Gopukumar D., Zhao Y., “Big Data Application in Biomedical Research and Health Care: A Literature Review”, Biomed. Inform. Insights., 8 (2016), 1–10