Allocation optimization for reducing resource utilization in Angara high-speed interconnect
Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 8 (2019) no. 1, pp. 5-19 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

This paper considers a high-speed interconnect with a multidimensional topology. The paper is devoted to the optimization of fragmentation resulting from sequential allocation of computing nodes in a supercomputer provided that network traffic from different user’s tasks should not overlap. This paper is the continuation of resources fragmentation optimization work. In this work, the method for scheduling tasks based on the policy of choosing the first suitable task (First-Fit) in a certain task window has been added to the accounting for fragmentation when choosing nodes. A set of different computer systems with three-dimensional and four-dimensional topologies was considered. The minimum system size is 32 computing nodes, and the maximum is 144. A synthetic queue of tasks is set for each system. The parameters of the synthetic queues are close to real ones and are based on data received from the Desmos cluster equipped with Angara interconnect. The average utilization of the resources of the computer system and the average waiting time for the tasks in the queue is chosen as a method quality criterion. Various sizes of task windows have been evalauated. The study showed that the increase of the resources utilization for the proposed method averaged 7 % compared to the base method, and the average time spent in queue was reduced by 36.6 %.
Keywords: Angara interconnect, deterministic routing, direction ordered routing, allocation.
Mots-clés : multidimensional torus, fragmentation
@article{VYURV_2019_8_1_a0,
     author = {A. V. Mukosey and A. S. Semenov and A. S. Simonov},
     title = {Allocation optimization for reducing resource utilization in {Angara} high-speed interconnect},
     journal = {Vestnik \^U\v{z}no-Uralʹskogo gosudarstvennogo universiteta. Seri\^a Vy\v{c}islitelʹna\^a matematika i informatika},
     pages = {5--19},
     year = {2019},
     volume = {8},
     number = {1},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/VYURV_2019_8_1_a0/}
}
TY  - JOUR
AU  - A. V. Mukosey
AU  - A. S. Semenov
AU  - A. S. Simonov
TI  - Allocation optimization for reducing resource utilization in Angara high-speed interconnect
JO  - Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika
PY  - 2019
SP  - 5
EP  - 19
VL  - 8
IS  - 1
UR  - http://geodesic.mathdoc.fr/item/VYURV_2019_8_1_a0/
LA  - ru
ID  - VYURV_2019_8_1_a0
ER  - 
%0 Journal Article
%A A. V. Mukosey
%A A. S. Semenov
%A A. S. Simonov
%T Allocation optimization for reducing resource utilization in Angara high-speed interconnect
%J Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika
%D 2019
%P 5-19
%V 8
%N 1
%U http://geodesic.mathdoc.fr/item/VYURV_2019_8_1_a0/
%G ru
%F VYURV_2019_8_1_a0
A. V. Mukosey; A. S. Semenov; A. S. Simonov. Allocation optimization for reducing resource utilization in Angara high-speed interconnect. Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 8 (2019) no. 1, pp. 5-19. http://geodesic.mathdoc.fr/item/VYURV_2019_8_1_a0/

[1] A. A. Agarkov, T. F. Ismagilov, D. V. Makagon, “Performance Evaluation of the Angara Interconnect”, Russian Supercomputing Days: Proceedings of the International Scientific Conference (Moscow, Russia, September, 26–27, 2016), Publishing of Moscow State University, Moscow, 2016, 626–639

[2] A. S. Simonov, D. V. Makagon, I. A. Zhabin, A. N. Shcherbak, E. L. Syromyatnikov, D. A. Polyakov, “The First Generation of Angara High-Speed Interconnect”, Science Intensive Technologies, 15:1 (2014), 21–28

[3] V. Puente, R. Beivide, J. A. Gregorio, J. M. Prellezo, J. Duato, C. Izu, “Adaptive Bubble Router: a Design to Improve Performance in Torus Networks”, Proceedings of the International Conference Parallel Processing (ICPP), 1999, 58–67 | DOI

[4] N. R. Adiga, M. Blumrich, D. Chen, “Blue Gene/L Torus Interconnection Network”, IBM Journal of Research and Development, 49:2 (2005), 265–276 | DOI

[5] S. L. Scott, The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus, 1996

[6] I. A. Pozhilov, A. S. Semenov, D. V. Makagon, “Connectivity Problem Solution for Direction Ordered Deterministic Routing in nD Torus”, Software Engineering, 2015, no. 3, 13–19

[7] Z. Lan, W. Tang, J. Wang, X. Yang, Z. Zhou, X. Zheng, “Balancing Job Performance with System Performance via Locality-aware Scheduling on Torus-connected Systems”, 2014 IEEE International Conference on Cluster Computing (CLUSTER), 2014, 140–148 | DOI

[8] IBM Redbooks Publication: IBM System Blue Gene Solution: Blue Gene/Q System Administration, 2013, 282 pp.

[9] W. Tang, Z. Lan, N. Desai, D. Buettner, Y. Yu, “Reducing Fragmentation on Torus-Connected Supercomputers”, Proceedings of the 2011 IEEE International Parallel Distributed Processing Symposium (IPDPS’11), IEEE Computer Society, Washington, DC, USA, 2011, 828–839 | DOI

[10] Cray Document: Managing System Software for Cray XE and Cray XT Systems, 2010

[11] U. Schwiegelshohn, R. Yahyapour, “Analysis of First-Come-First-Serve Parallel Job Scheduling”, SODA, 98 (1998), 629–638 | MR

[12] P. N. Polezhaev, “The Study of Parallel Job Scheduling Algorithms for Cluster Computing Systems Using a Simulator”, Parallel Computational Technologies (PCT’2010): Proceedings of the International Scientific Conference (Ufa, Russia, March, 29–April, 2, 2010), Publishing of the South Ural State University, Chelyabinsk, 2010, 287–298

[13] A. W. Mu’alem, D. G. Feitelson, “Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling”, IEEE Transactions on Parallel and Distributed Systems, 12:6 (2001), 529–543 | DOI

[14] R. L. Henderson, “Job Scheduling Under the Portable Batch System”, Workshop on Job Scheduling Strategies for Parallel Processing, Springer, Berlin, Heidelberg, 1995, 279–294 | DOI

[15] G. Staples, “TORQUE Resource Manager”, Proceedings of the 2006 ACM/IEEE Conf. on Supercomputing, 2006, 8, ACM

[16] D. Jackson, Q. Snell, M. Clement, “Core Algorithms of the Maui Scheduler”, Workshop on Job Scheduling Strategies for Parallel Processing, 2001, 87–102, Springer, Berlin, Heidelberg | DOI | Zbl

[17] W. Gentzsch, “Sun Grid Engine: Towards Creating a Compute Power Grid”, Cluster Computing and the Grid, 2001. Proceedings. First IEEE/ACM International Symposium on, IEEE, 2001, 35–36 | DOI

[18] A. V. Baranov, S. V. Smirnov, M. Yu. Khramtsov, S. V. Sharf, “Modernization of the SUPZ MBS-1000”, Materials of the All-Russian Scientific Conference “Scientific Service on the Internet” (22–27 September 2008, Novorossiysk), MSU Publishing House, Moscow, 2008, 226–227

[19] SchedMD L. L. C. SLURM Workload Manager, , 2018 https://slurm.schedmd.com/overview.html

[20] A. V. Mukosey, A. S. Semenov, “Allocation Optimization for Reducing Resource Fragmentation in Angara High-speed Interconnect”, Parallel Computational Technologies (PCT’2018): Proceedings of the International Scientific Conference (Rostov-na-Donu, Russia, April, 2–6, 2018), Chelyabinsk, Publishing of the South Ural State University, 2018, 310–318

[21] S. H. Woo, “Task Scheduling in Distributed Computing Systems with a Genetic Algorithm”, High Performance Computing on the Information Superhighway, HPC Asia’97, IEEE, 1997, 301–305 | DOI

[22] V. S. Vecher, N. D. Kondratyuk, G. S. Smirnov, V. V. Stegailov, “Angara-based hybrid supercomputer for efficient acceleration of computational materials science studies”, Russian Supercomputing Days: Proceedings of the International Conference (Moscow, Russia, September, 25–26, 2017), Publishing of Moscow State University, Moscow, 2017, 557–571

[23] A. V. Mukosey, A. S. Semenov, “An Approximate Algorithm for Choosing the Optimal Subset of Nodes in the Angara Interconnect with Failures”, Numerical methods and Programming, 18 (2017), 53–64

[24] A. V. Baranov, E. A. Kiselev, D. S. Lyakhovets, “The Quasi Scheduler for Utilization of Multiprocessing Computing System’s Idle Resources Under Control of the Management System of the Parallel Jobs”, Bulletin of South Ural State University. Series: Mathematical Modeling, Programming Computer Software, 3:4 (2014), 75–84 | DOI | MR

[25] J. F. Gonçalves, M. G. C. Resende, “A Parallel Multi-population Biased Random-key Genetic Algorithm for a Container Loading Problem”, Computers Operations Research. February 2012, 39:2 (2012), 179–190 | DOI | MR | Zbl