Voir la notice de l'article provenant de la source Numdam
We present a specialized compressor designed for efficient data storage of FASTQ files produced by high-throughput DNA sequencers. Since the method has been optimized for compression quality, it is especially suitable for long-term storage and for genome research centers processing huge amount of data (counted in petabytes). The proposed compressor uses high-order statistical models for range encoding, similar to Markov models, but the whole input is considered in building a symbol context. Compression of DNA reads is performed according to LZ-style with the use of the 5–7th order model, while nucleotides’ scores are encoded with the 3rd order model.
Chlopkowski, Marek 1 ; Antczak, Maciej 1 ; Slusarczyk, Michal 1 ; Wdowinski, Aleksander 1 ; Zajaczkowski, Michal 1 ; Kasprzak, Marta 1, 2
@article{RO_2016__50_2_351_0, author = {Chlopkowski, Marek and Antczak, Maciej and Slusarczyk, Michal and Wdowinski, Aleksander and Zajaczkowski, Michal and Kasprzak, Marta}, title = {High-order statistical compressor for long-term storage of {DNA} sequencing data}, journal = {RAIRO - Operations Research - Recherche Op\'erationnelle}, pages = {351--361}, publisher = {EDP-Sciences}, volume = {50}, number = {2}, year = {2016}, doi = {10.1051/ro/2015039}, mrnumber = {3479875}, language = {en}, url = {http://geodesic.mathdoc.fr/articles/10.1051/ro/2015039/} }
TY - JOUR AU - Chlopkowski, Marek AU - Antczak, Maciej AU - Slusarczyk, Michal AU - Wdowinski, Aleksander AU - Zajaczkowski, Michal AU - Kasprzak, Marta TI - High-order statistical compressor for long-term storage of DNA sequencing data JO - RAIRO - Operations Research - Recherche Opérationnelle PY - 2016 SP - 351 EP - 361 VL - 50 IS - 2 PB - EDP-Sciences UR - http://geodesic.mathdoc.fr/articles/10.1051/ro/2015039/ DO - 10.1051/ro/2015039 LA - en ID - RO_2016__50_2_351_0 ER -
%0 Journal Article %A Chlopkowski, Marek %A Antczak, Maciej %A Slusarczyk, Michal %A Wdowinski, Aleksander %A Zajaczkowski, Michal %A Kasprzak, Marta %T High-order statistical compressor for long-term storage of DNA sequencing data %J RAIRO - Operations Research - Recherche Opérationnelle %D 2016 %P 351-361 %V 50 %N 2 %I EDP-Sciences %U http://geodesic.mathdoc.fr/articles/10.1051/ro/2015039/ %R 10.1051/ro/2015039 %G en %F RO_2016__50_2_351_0
Chlopkowski, Marek; Antczak, Maciej; Slusarczyk, Michal; Wdowinski, Aleksander; Zajaczkowski, Michal; Kasprzak, Marta. High-order statistical compressor for long-term storage of DNA sequencing data. RAIRO - Operations Research - Recherche Opérationnelle, Special issue: Research on Optimization and Graph Theory dedicated to COSI 2013 / Special issue: Recent Advances in Operations Research in Computational Biology, Bioinformatics and Medicine, Tome 50 (2016) no. 2, pp. 351-361. doi: 10.1051/ro/2015039
Cité par Sources :