Integration of heterogeneous computing infrastructures for genome sequencing data analysis
Matematičeskaâ biologiâ i bioinformatika, Tome 11 (2016) no. 2, pp. 205-213.

Voir la notice de l'article provenant de la source Math-Net.Ru

Recent technological achievements in Next Generation Sequencing (NGS) lead to significant increase in amounts of data which must be processed, analyzed and made remotely available to scientists. This, in turn, increased the requirements to the computing platforms used for data processing in terms of RAM and processors’ power. Entirely new approaches in organization of computations are required in order to process data efficiently. Authors of this paper have researched the possibility of adapting the methods and approaches employed in high energy physics for integration of heterogeneous computing resources into unified computing platform. Fully specified data and task management system based on computing resources of National Research Centre "Kurchatov Institute" was developed. We’ve also developed a workflow for processing genome sequencing data with PALEOMIX package and integrated it into the system. Results of the developed system's approbation on ancient mammoth DNA sequencing task have demonstrated a significant reduction in total computational time of the task.
@article{MBB_2016_11_2_a0,
     author = {V. A. Aulov and A. A. Klimentov and R. Yu. Mashinistov and A. V. Nedoluzhko and A. M. Novikov and A. A. Poyda and I. S. Tertychnyi and A. B. Teslyuk and F. S. Sharko},
     title = {Integration of heterogeneous computing infrastructures for genome sequencing data analysis},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {205--213},
     publisher = {mathdoc},
     volume = {11},
     number = {2},
     year = {2016},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MBB_2016_11_2_a0/}
}
TY  - JOUR
AU  - V. A. Aulov
AU  - A. A. Klimentov
AU  - R. Yu. Mashinistov
AU  - A. V. Nedoluzhko
AU  - A. M. Novikov
AU  - A. A. Poyda
AU  - I. S. Tertychnyi
AU  - A. B. Teslyuk
AU  - F. S. Sharko
TI  - Integration of heterogeneous computing infrastructures for genome sequencing data analysis
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2016
SP  - 205
EP  - 213
VL  - 11
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2016_11_2_a0/
LA  - ru
ID  - MBB_2016_11_2_a0
ER  - 
%0 Journal Article
%A V. A. Aulov
%A A. A. Klimentov
%A R. Yu. Mashinistov
%A A. V. Nedoluzhko
%A A. M. Novikov
%A A. A. Poyda
%A I. S. Tertychnyi
%A A. B. Teslyuk
%A F. S. Sharko
%T Integration of heterogeneous computing infrastructures for genome sequencing data analysis
%J Matematičeskaâ biologiâ i bioinformatika
%D 2016
%P 205-213
%V 11
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2016_11_2_a0/
%G ru
%F MBB_2016_11_2_a0
V. A. Aulov; A. A. Klimentov; R. Yu. Mashinistov; A. V. Nedoluzhko; A. M. Novikov; A. A. Poyda; I. S. Tertychnyi; A. B. Teslyuk; F. S. Sharko. Integration of heterogeneous computing infrastructures for genome sequencing data analysis. Matematičeskaâ biologiâ i bioinformatika, Tome 11 (2016) no. 2, pp. 205-213. http://geodesic.mathdoc.fr/item/MBB_2016_11_2_a0/

[1] Skryabin K. G., Prokhortchouk E. B., Mazur A. M., Boulygina E. S., Tsygankova S. V., Nedoluzhko A. V., Rastorguev S. M., Matveev V. B., Chekanov N. N., Goranskaya D. A., Teslyuk A. B., Gruzdeva N. M., Velikhov V. E., Zaridze D. G., Kovalchuk M. V., “Combining two technologies for full genome sequencing of human”, Acta Nat., 1:3 (2009), 102–107

[2] Kawalia A., Motameny S., Wonczak S., Thiele H., Nieroda L., Jabbari K., Borowski S., Sinha V., Gunia W., Lang U., Achter V., Nurnberg P., “Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow”, PLoS One, 10:5 (2015), e0126321 | DOI

[3] Bao R., Huang L., Andrade J., Tan W., Kibbe W. A., Jiang H., Feng G., “Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing”, Cancer Inform., 13, Suppl. 2 (2014), 67–82

[4] Miller W., Drautz D. I., Ratan A., Pusey B., Qi J., Lesk A. M., Tomsho L. P., Packard M. D., Zhao F., Sher A., Tikhonov A., Raney B., Patterson N., Linblad-Toh K., Lander E. S., Knight J. R., Irzyk G. P., Fredrikson K. M., Harkins T. T., Sheridan S., Pringle T., Schuster S. C., “Sequencing the nuclear genome of the extinct woolly mammoth”, Nature, 456 (2008), 387–390 | DOI

[5] Rasmussen M., Li Y., Lindgreen S., Pedersen J. S., Albrechtsen A., Moltke I., Metspalu M., Metspalu E., Kivisild T., Gupta R., et al., “Ancient human genome sequence of an extinct Palaeo–Eskimo”, Nature, 463 (2009), 757–762 | DOI

[6] Keller A., Graefen A., Ball M., Matzas M., Boisguerin V., Maixner F., Leidinger P., Backes C., Khairat R., Forster M., et al., “New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing”, Nature Communications, 3 (2011)

[7] Allentoft M. E., Collins M., Harker D., Haile J., Oskam C. L., Hale M. L., Campos P. F., Samaniego J. A., Gilbert M. T., Willerslev E., et al., “The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils”, Proc Biol Sci., 279 (2012), 4724–4733 | DOI

[8] Nedoluzhko A. V., Boulygina E. S., Sokolov A. S., Tsygankova S. V., Gruzdeva N. M., Rezepkin A. D., Prokhortchouk E. B., “Analysis of the Mitochondrial Genome of a Novosvobodnaya Culture Representative using Next-Generation Sequencing and Its Relation to the Funnel Beaker Culture”, Acta Naturae, 6 (2014), 31–35

[9] Sokolov A. S., Nedoluzhko A. V., Boulygina E. S., Tsygankova S. V., Gruzdeva N. M., Shishlov A. V., Kolpakova A., Rezepkin A. D., Skryabin K. G., Prokhortchouk E. B., “Six complete mitochondrial genomes from Early Bronze Age humans in the North Caucasus”, Journal of Archaeological Sciences, 73 (2016), 138–144 | DOI

[10] Martin M. D., Cappellini E., Samaniego J. A., Zepeda M. L., Campos P. F., Seguin-Orlando A., Wales N., Orlando L., Ho S. Y., Dietrich F. S., et al., “Reconstructing genome evolution in historic samples of the Irish potato famine pathogen”, Nature Communications, 4 (2013)

[11] Yoshida K., Schuenemann V. J., Cano L. M., Pais M., Mishra B., Sharma R., Lanz C., Martin F. N., Kamoun S., Krause J., et al., “The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine”, eLife, 2 (2013)

[12] Lorenzen E. D., Nogues-Bravo D., Orlando L., Weinstock J., Binladen J., Marske K. A., Ugan A., Borregaard M. K., Gilbert M. T., Nielsen R., et al., “Species-specific responses of Late Quaternary megafauna to climate and humans”, Nature, 479 (2011), 359–364 | DOI

[13] Lynch V. J., Bedoya-Reina O. C., Ratan A., Sulak M., Drautz-Moses D. I., Perry G. H., Miller W., Schuster S. C., “Elephantid Genomes Reveal the Molecular Bases of Woolly Mammoth Adaptations to the Arctic”, Sell Rep., 12:2 (2015), 217–228 | MR

[14] Massie M., Nothaft F., Hartl C., Kozanitis C., Schumacher A., Joseph A. D., Patterson D. A., ADAM: genomics formats and processing patterns for cloud scale computing, Report No UCB/EECS-2013-207, EECS Department, University of California, Berkeley, 2013

[15] Schubert M., Ermini L., Sarkissian C. D., Jonsson H., Ginolhac A., Schaefer R., Martin M. D., Fernandez R., Kircher M., McCue M., Willerslev E., Orlando L., “Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX”, Nat Protoc., 9 (2014), 1056–1082 | DOI

[16] G. Aad et al. (The ATLAS Collaboration), “The ATLAS Experiment at the CERN Large Hadron Collider”, Journal of Instrumentation, 3 (2008) | DOI | Zbl

[17] The Large Hadron Collider, (data obrascheniya: 28.09.2016) http://home.cern/topics/large-hadron-collider

[18] Maeno T. on behalf of PANDA team and ATLAS collaboration, “PanDA: distributed production and distributed analysis system for ATLAS”, Journal of Physics: Conference Series, 119:6 (2008) | DOI

[19] Maeno T., De K., Klimentov A., Nilsson P., Oleynik D., Panitkin S., Petrosyan A., Schovancova J., Vaniachine A., Wenaus T., “Evolution of the ATLAS PanDA workload management system for exascale computational science”, J. Phys.: Conf. Ser., 513 (2014) | DOI

[20] Worldwide LHC Computing Grid, (data obrascheniya: 28.09.2016) http://wlcg.web.cern.ch/