High-order statistical compressor for long-term storage of DNA sequencing data
RAIRO - Operations Research - Recherche Opérationnelle, Special issue: Research on Optimization and Graph Theory dedicated to COSI 2013 / Special issue: Recent Advances in Operations Research in Computational Biology, Bioinformatics and Medicine, Tome 50 (2016) no. 2, pp. 351-361

Voir la notice de l'article provenant de la source Numdam

We present a specialized compressor designed for efficient data storage of FASTQ files produced by high-throughput DNA sequencers. Since the method has been optimized for compression quality, it is especially suitable for long-term storage and for genome research centers processing huge amount of data (counted in petabytes). The proposed compressor uses high-order statistical models for range encoding, similar to Markov models, but the whole input is considered in building a symbol context. Compression of DNA reads is performed according to LZ-style with the use of the 5–7th order model, while nucleotides’ scores are encoded with the 3rd order model.

Reçu le :
Accepté le :
DOI : 10.1051/ro/2015039
Classification : 68P20, 68P30, 68W32, 92D20
Keywords: High-throughput DNA sequencing, data compression, FASTQ files

Chlopkowski, Marek 1 ; Antczak, Maciej 1 ; Slusarczyk, Michal 1 ; Wdowinski, Aleksander 1 ; Zajaczkowski, Michal 1 ; Kasprzak, Marta 1, 2

1 Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland.
2 Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
@article{RO_2016__50_2_351_0,
     author = {Chlopkowski, Marek and Antczak, Maciej and Slusarczyk, Michal and Wdowinski, Aleksander and Zajaczkowski, Michal and Kasprzak, Marta},
     title = {High-order statistical compressor for long-term storage of {DNA} sequencing data},
     journal = {RAIRO - Operations Research - Recherche Op\'erationnelle},
     pages = {351--361},
     publisher = {EDP-Sciences},
     volume = {50},
     number = {2},
     year = {2016},
     doi = {10.1051/ro/2015039},
     mrnumber = {3479875},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.1051/ro/2015039/}
}
TY  - JOUR
AU  - Chlopkowski, Marek
AU  - Antczak, Maciej
AU  - Slusarczyk, Michal
AU  - Wdowinski, Aleksander
AU  - Zajaczkowski, Michal
AU  - Kasprzak, Marta
TI  - High-order statistical compressor for long-term storage of DNA sequencing data
JO  - RAIRO - Operations Research - Recherche Opérationnelle
PY  - 2016
SP  - 351
EP  - 361
VL  - 50
IS  - 2
PB  - EDP-Sciences
UR  - http://geodesic.mathdoc.fr/articles/10.1051/ro/2015039/
DO  - 10.1051/ro/2015039
LA  - en
ID  - RO_2016__50_2_351_0
ER  - 
%0 Journal Article
%A Chlopkowski, Marek
%A Antczak, Maciej
%A Slusarczyk, Michal
%A Wdowinski, Aleksander
%A Zajaczkowski, Michal
%A Kasprzak, Marta
%T High-order statistical compressor for long-term storage of DNA sequencing data
%J RAIRO - Operations Research - Recherche Opérationnelle
%D 2016
%P 351-361
%V 50
%N 2
%I EDP-Sciences
%U http://geodesic.mathdoc.fr/articles/10.1051/ro/2015039/
%R 10.1051/ro/2015039
%G en
%F RO_2016__50_2_351_0
Chlopkowski, Marek; Antczak, Maciej; Slusarczyk, Michal; Wdowinski, Aleksander; Zajaczkowski, Michal; Kasprzak, Marta. High-order statistical compressor for long-term storage of DNA sequencing data. RAIRO - Operations Research - Recherche Opérationnelle, Special issue: Research on Optimization and Graph Theory dedicated to COSI 2013 / Special issue: Recent Advances in Operations Research in Computational Biology, Bioinformatics and Medicine, Tome 50 (2016) no. 2, pp. 351-361. doi: 10.1051/ro/2015039

Cité par Sources :