Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPS systems
Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 5 (2016) no. 4, pp. 32-45 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

Efficient use and high output of any supercomputer depends on a great number of factors. The problem of controlling granted resource utilization is one of those, and becomes especially noticeable in conditions of concurrent work of many user projects. It is important to provide users with detailed information on peculiarities of their executed jobs. At the same time it is important to provide project managers with detailed information on resource utilization by project members by giving access to the detailed job analysis. Unfortunately, such information is rarely available. This gap should be eliminated with our proposed approach to supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems based on system monitoring data management and study, building integral job characteristics, revealing job categories and single job run peculiarities.
Keywords: supercomputer, efficiency, system monitoring, job categories, integral job characteristics, queued job collection, resource utilization control.
Mots-clés : job queue
@article{VYURV_2016_5_4_a2,
     author = {D. A. Nikitenko and V. V. Voevodin and A. M. Teplov and S. A. Zhumatii and Vad. V. Voevodin and K. S. Stefanov and P. A. Shvets},
     title = {Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale {HPS} systems},
     journal = {Vestnik \^U\v{z}no-Uralʹskogo gosudarstvennogo universiteta. Seri\^a Vy\v{c}islitelʹna\^a matematika i informatika},
     pages = {32--45},
     year = {2016},
     volume = {5},
     number = {4},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/VYURV_2016_5_4_a2/}
}
TY  - JOUR
AU  - D. A. Nikitenko
AU  - V. V. Voevodin
AU  - A. M. Teplov
AU  - S. A. Zhumatii
AU  - Vad. V. Voevodin
AU  - K. S. Stefanov
AU  - P. A. Shvets
TI  - Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPS systems
JO  - Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika
PY  - 2016
SP  - 32
EP  - 45
VL  - 5
IS  - 4
UR  - http://geodesic.mathdoc.fr/item/VYURV_2016_5_4_a2/
LA  - en
ID  - VYURV_2016_5_4_a2
ER  - 
%0 Journal Article
%A D. A. Nikitenko
%A V. V. Voevodin
%A A. M. Teplov
%A S. A. Zhumatii
%A Vad. V. Voevodin
%A K. S. Stefanov
%A P. A. Shvets
%T Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPS systems
%J Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika
%D 2016
%P 32-45
%V 5
%N 4
%U http://geodesic.mathdoc.fr/item/VYURV_2016_5_4_a2/
%G en
%F VYURV_2016_5_4_a2
D. A. Nikitenko; V. V. Voevodin; A. M. Teplov; S. A. Zhumatii; Vad. V. Voevodin; K. S. Stefanov; P. A. Shvets. Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPS systems. Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 5 (2016) no. 4, pp. 32-45. http://geodesic.mathdoc.fr/item/VYURV_2016_5_4_a2/

[1] Top50 Supercomputers of Russia and CIS, (data obrascheniya: 15.02.2016) http://top50.supercomputers.ru/

[2] Top500 Supercomputer Sites, (data obrascheniya: 15.02.2016) http://top500.org/

[3] A. Antonov, S. Zhumatiy, D. Nikitenko, K. Stefanov, A. Teplov, P. Shvets, “Analysis of Dynamic Characteristics of Job Stream on Supercomputer System”, Numerical Methods and Programming, 14:2 (2013), 104–108

[4] A. Safonov, P. Kostenetskiy, K. Borodulin, F. Melekhin, “A Monitoring System for Supercomputers of SUSU”, Russian Supercomputing Days International Conference (Moscow, Russian Federation, 28-29 September, 2015), CEUR Workshop Proceedings, 1482, 2015, 662–666

[5] K. Stefanov et al., “Dynamically Reconfigurable Distributed Modular Monitoring System for Supercomputers (DiMMon)”, Procedia Computer Science, 66 (2015), 625–634 | DOI

[6] D. Nikitenko, “Complex Approach to Performance Analysis of Supercomputer Systems Based on System Monitoring Data”, Numerical Methods and Programming, 15 (2014), 85–97

[7] V. Voevodin, S. Zhumatiy, D. Nikitenko, “Octoshell: Large Supercomputer Complex Administration System”, Russian Supercomputing Days International Conference (Moscow, Russian Federation, 28-29 September, 2015), CEUR Workshop Proceedings, 1482, 2015, 69–83

[8] V. Voevodin, D. Nikitenko, S. Zhumatiy, “Resolving Frontier Problems of Mastering Large-Scale Supercomputer Complexes”, Proceedings of the ACM International Conference on Computing Frontiers, CF’16 (Como, Italy, 16-18 May, 2016), ACM New York, NY, USA, 2016, 349–352 | DOI

[9] Vl. Voevodin, A. Antonov, P. Bryzgalov, D. Nikitenko, S. Zhumatiy, S. Sobolev, K. Stefanov, Vad. Voevodin, “Practice of "Lomonosov" Supercomputer”, Open Systems, 2012, no. 7, 36–39

[10] S. Zhumatiy, D. Nikitenko, “Approach to Flexible Supercomputers Management”, International Supercomputing Conference Scientific Services Internet: All Parallelism Edges (Novorossiysk, Russian Federation, 23-28 September, 2013), MSU, 2013, 296–300

[11] Vl. Voevodin, “Supercomputer Situational Screen”, Open Systems, 2014, no. 3, 36–39

[12] P. Shvets, A. Antonov, D. Nikitenko, , S. Sobolev, K. Stefanov, Vad. Voevodin, Vl. Voevodin, S. Zhumatiy, “An Approach for Ensuring Reliable Functioning of a Supercomputer Based on a Formal Model”, Parallel Processing and Applied Mathematics, 11th International Conference, PPAM 2015 (Krakow, Poland, September 6-9, 2015), Theoretical Computer Science and General Issues, 9573, Springer International Publishing, 2015, 12–22 | DOI

[13] V. Voevodin, A. Antonov, J. Dongarra, “AlgoWiki: an Open Encyclopedia of Parallel Algorithmic Features”, Supercomputing Frontiers and Innovations, 2:1 (2015), 4–18 | DOI

[14] SLURM Workload Manager, (data obrascheniya: 15.02.2016) http://slurm.schedmd.com/

[15] Cleo Cluster Batch System, (data obrascheniya: 15.02.2016) http://sourceforge.net/projects/cleo-bs/

[16] Ganglia Monitoring System, (data obrascheniya: 15.02.2016) http://ganglia.sourceforge.net/

[17] Collectd – The System Statistics Collection Daemon, (data obrascheniya: 15.02.2016) https://collectd.org/

[18] Clustrx, (data obrascheniya: 15.02.2016) http://www.t-platforms.ru/products/software/clustrxproductfamily/clustrxwatch.html

[19] jQuery jQuery UI, (data obrascheniya: 15.02.2016) http://jqueryui.com/

[20] TagIt, (data obrascheniya: 15.02.2016) http://aehlke.github.io/tag-it/