Shared memory based MPI Reduce and Bcast algorithms
Numerical methods and programming, Tome 24 (2023) no. 4, pp. 339-351
Voir la notice de l'article provenant de la source Math-Net.Ru
Algorithms for implementing collective operations MPI_Bcast, MPI_Reduce, MPI_Allreduce using shared memory of multiprocessor servers are proposed. The algorithms create a shared memory segment and a system of queues in it, through which message blocks are transmitted. The software implementation is based on the Open MPI library as an isolated coll/sharm component. Unlike existing algorithms, interaction with the queuing system is organized with spinlock and focused on reducing the number of barrier synchronizations and atomic operations. When conducting experiments on a server with x86–64 architecture for the MPI_Bcast operation, the largest reduction in time was obtained by 6.5 times (85% less) and MPI_Reduce by 3.3 times (70% less) compared to the implementation in the coll/tuned component of the Open MPI library. Recommendations on the use of algorithms for different message sizes are suggested.
Keywords:
Bcast; Reduce; Allreduce; collective operations; MPI; computer systems.
@article{VMP_2023_24_4_a8,
author = {A. A. Romanyuta and M. G. Kurnosov},
title = {Shared memory based {MPI} {Reduce} and {Bcast} algorithms},
journal = {Numerical methods and programming},
pages = {339--351},
publisher = {mathdoc},
volume = {24},
number = {4},
year = {2023},
language = {ru},
url = {http://geodesic.mathdoc.fr/item/VMP_2023_24_4_a8/}
}
A. A. Romanyuta; M. G. Kurnosov. Shared memory based MPI Reduce and Bcast algorithms. Numerical methods and programming, Tome 24 (2023) no. 4, pp. 339-351. http://geodesic.mathdoc.fr/item/VMP_2023_24_4_a8/