Assignment of tasks to machines under data replication with a tie to Steiner systems
International Journal of Applied Mathematics and Computer Science, Tome 34 (2024) no. 2, pp. 263-275.

Voir la notice de l'article provenant de la source Library of Science

In the paper a problem of assignment of tasks to machines is formulated and solved, where a criterion of data replication is used and a large size of data imposes additional constraints. This problem is met in practice when dealing with large genomic files or other types of vast data. The necessity of comparing all pairs of files within a big set of DNA sequencing results, which we collected, maintained, and analyzed within a national genomic project, brought us to the proposed results. This problem resembles that of generating a particular Steiner system, and a mechanism observed there is employed in one of our algorithms. Based on the problem complexity, we propose two heuristic algorithms, which work very well even for instances with tight constraints and a heterogeneous environment defined. In addition, we propose a simplified method, nevertheless capable of finding very good solutions and surpassing the algorithms in some special cases. The methods are validated in tests on a wide set of instances, where values of parameters reflect our real-world application and where their usefulness is proven.
Keywords: task assignment, genomic data processing, integer linear programming, Steiner system, heuristic algorithm
Mots-clés : przydział zadań, dane genomowe, programowanie liniowe, algorytm heurystyczny
@article{IJAMCS_2024_34_2_a6,
     author = {Wojciechowski, Pawe{\l} and Kasprzak, Marta},
     title = {Assignment of tasks to machines under data replication with a tie to {Steiner} systems},
     journal = {International Journal of Applied Mathematics and Computer Science},
     pages = {263--275},
     publisher = {mathdoc},
     volume = {34},
     number = {2},
     year = {2024},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_2_a6/}
}
TY  - JOUR
AU  - Wojciechowski, Paweł
AU  - Kasprzak, Marta
TI  - Assignment of tasks to machines under data replication with a tie to Steiner systems
JO  - International Journal of Applied Mathematics and Computer Science
PY  - 2024
SP  - 263
EP  - 275
VL  - 34
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_2_a6/
LA  - en
ID  - IJAMCS_2024_34_2_a6
ER  - 
%0 Journal Article
%A Wojciechowski, Paweł
%A Kasprzak, Marta
%T Assignment of tasks to machines under data replication with a tie to Steiner systems
%J International Journal of Applied Mathematics and Computer Science
%D 2024
%P 263-275
%V 34
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_2_a6/
%G en
%F IJAMCS_2024_34_2_a6
Wojciechowski, Paweł; Kasprzak, Marta. Assignment of tasks to machines under data replication with a tie to Steiner systems. International Journal of Applied Mathematics and Computer Science, Tome 34 (2024) no. 2, pp. 263-275. http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_2_a6/

[1] Babai, L. and Wilmes, J. (2013). Quasipolynomial-time canonical form for Steiner designs, Proceedings of the 45th Annual ACM Symposium on Theory of Computing (STOC’13), Palo Alto, USA, pp. 261-270.

[2] Berlin, K., Koren, S., Chin, C., Drake, J., Landolin, J. and Phillippy, A. (2015). Assembling large genomes with single-molecule sequencing and locality-sensitive hashing, Nature Biotechnology 33(6): 623-630.

[3] Blazewicz, J., Kasprzak, M., Kierzynka, M., Frohmberg, W., Swiercz, A., Wojciechowski, P. and Zurkowski, P. (2018). Graph algorithms for DNA sequencing - Origins, current models and the future, European Journal of Operational Research 264(3): 799-812.

[4] Bogdanowicz, D. and Giaro, K. (2013). On a matching distance between rooted phylogenetic trees, International Journal of Applied Mathematics and Computer Science 23(3): 669-684, DOI: 10.2478/amcs-2013-0050.

[5] Bose, R. (1939). On the construction of balanced incomplete block designs, Annals of Eugenics 9: 353-399.

[6] Briquet, C., Dalem, X., Jodogne, S. and de Marneffe, P.-A. (2007). Scheduling data-intensive bags of tasks in P2P grids with bittorrent-enabled data distribution, Proceedings of the 2nd Workshop on Use of P2P, GRID and Agents for the Development of Content Networks (UPGRADE’07), Monterey, USA, pp. 39-48.

[7] Gopalakrishnan, S. and Caccamo, M. (2006). Task partitioning with replication upon heterogeneous multiprocessor systems, Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’06), San Jose, USA, pp. 199-207.

[8] Grannell, M. and Griggs, T. (1994). An introduction to Steiner systems, Mathematical Spectrum 26(3): 74-80.

[9] Jain, C., Rodriguez-R, L., Phillippy, A., Konstantinidis, K. and Aluru, S. (2018). High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries, Nature Communications 9: 5114.

[10] Majdzik, P. (2022). A feasible schedule for parallel assembly tasks in flexible manufacturing systems, International Journal of Applied Mathematics and Computer Science 32(1): 51-63, DOI: 10.34768/amcs-2022-0005.

[11] Nukarapu, D., Tang, B., Wang, L. and Lu, S. (2011). Data replication in data intensive scientific applications with performance guarantee, IEEE Transactions on Parallel and Distributed Systems 22(8): 1299-1306.

[12] Nurk, S., Walenz, B., Rhie, A., Vollger, M., Logsdon, G., Grothe, R., Miga, K., Eichler, E., Phillippy, A. and Koren, S. (2020). HiCanu: Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads, Genome Research 30(9): 1291-1305.

[13] Ondov, B., Treangen, T., Melsted, P., Mallonee, A., Bergman, N., Koren, S. and Phillippy, A. (2016). Mash: Fast genome and metagenome distance estimation using MinHash, Genome Biology 17: 132.

[14] Povoa, M. and Xavier, E. (2018). Approximation algorithms and heuristics for task scheduling in data-intensive distributed systems, International Transactions in Operational Research 25(5): 1417-1441.

[15] Reid, C. and Rosa, A. (2010). Steiner systems S(2, 4, v)-A survey, The Electronic Journal of Combinatorics #DS18: 1-34.

[16] Wojciechowski, P., Krause, K., Lukasiak, P. and Blazewicz, J. (2021). The correctness of large scale analysis of genomic data, Foundations of Computing and Decision Sciences 46(4): 423-436.