Automatic program parallelization with block data distribution
Sibirskij žurnal vyčislitelʹnoj matematiki, Tome 18 (2015) no. 1, pp. 41-53.

Voir la notice de l'article provenant de la source Math-Net.Ru

This paper discusses several automated methods of acceleration programs. The acceleration is achieved by parallelization and optimization of memory access. Optimization of accesses to RAM is achieved by switching to a block code and block placements arrays. When using a distributed memory, the automated distribution of arrays and array distribution with overlapping are employed. Automation is implemented using the C language with pragmas in Open Parallelizing System. This paper presents the numerical results for linear algebra and mathematical physics. Some features of this demonstration converter have a remote access to the Internet.
@article{SJVM_2015_18_1_a3,
     author = {L. R. Gervich and E. N. Kravchenko and B. Y. Steinberg and M. V. Yurushkin},
     title = {Automatic program parallelization with block data distribution},
     journal = {Sibirskij \v{z}urnal vy\v{c}islitelʹnoj matematiki},
     pages = {41--53},
     publisher = {mathdoc},
     volume = {18},
     number = {1},
     year = {2015},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/SJVM_2015_18_1_a3/}
}
TY  - JOUR
AU  - L. R. Gervich
AU  - E. N. Kravchenko
AU  - B. Y. Steinberg
AU  - M. V. Yurushkin
TI  - Automatic program parallelization with block data distribution
JO  - Sibirskij žurnal vyčislitelʹnoj matematiki
PY  - 2015
SP  - 41
EP  - 53
VL  - 18
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/SJVM_2015_18_1_a3/
LA  - ru
ID  - SJVM_2015_18_1_a3
ER  - 
%0 Journal Article
%A L. R. Gervich
%A E. N. Kravchenko
%A B. Y. Steinberg
%A M. V. Yurushkin
%T Automatic program parallelization with block data distribution
%J Sibirskij žurnal vyčislitelʹnoj matematiki
%D 2015
%P 41-53
%V 18
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/SJVM_2015_18_1_a3/
%G ru
%F SJVM_2015_18_1_a3
L. R. Gervich; E. N. Kravchenko; B. Y. Steinberg; M. V. Yurushkin. Automatic program parallelization with block data distribution. Sibirskij žurnal vyčislitelʹnoj matematiki, Tome 18 (2015) no. 1, pp. 41-53. http://geodesic.mathdoc.fr/item/SJVM_2015_18_1_a3/

[1] Abu-Khalil Zh. M., Morylev R. I., Shteinberg B. Ya., “Parallelnyi algoritm globalnogo vyravnivaniya s optimalnym ispolzovaniem pamyati”, Sovremennye problemy nauki i obrazovaniya, 2013, no. 1 http://www.science-education.ru/107-8139

[2] Avtomaticheskii rasparallelivatel programm s web-interfeisom, http://ops.opsgroup.ru

[3] Arykov S. B., Malyshkin V. E., “Sistema asinkhronnogo parallelnogo programmirovaniya ‘Aspekt’ ”, Vychislitelnye metody i programmirovanie, 9:1 (2008), 205–209

[4] Valkovskii V. A., “Parallelnoe vypolnenie tsiklov. Metod parallelepipedov”, Kibernetika, 1982, no. 2, 51–62

[5] Valkovskii V. A., “Parallelnoe vypolnenie tsiklov. Metod piramid”, Kibernetika, 1983, no. 5, 51–55 | MR

[6] Valkovskii V. A., Rasparallelivanie algoritmov i programm. Strukturnyi podkhod, Radio i svyaz, M., 1989 | MR

[7] Korneev V. V., “Problemy programmirovaniya superkompyuterov na baze mnogoyadernykh multitredovykh kristallov”, Nauchnyi servis v seti Internet: masshtabiruemost, parallelnost, effektivnost, Tr. Vserossiiskoi superkompyuternoi konferentsii (21–26 sentyabrya 2009 g., Novorossiisk), Izd-vo MGU, M., 2009

[8] Linev A. V., Bogolyubov D. K., Bastrakov S. I., Tekhnologii parallelnogo programmirovaniya dlya protsessorov novykh arkhitektur, Uchebnik, Seriya “Superkompyuternoe obrazovanie”, ed. V. P. Gergel, Izd-vo Moskovskogo universiteta, M., 2010

[9] Likhoded N. A., “Raspredelenie operatsii i massivov dannykh mezhdu protsessorami”, Programmirovanie, 2003, no. 3, 73–80 | MR | Zbl

[10] Optimiziruyuschaya rasparallelivayuschaya sistema, http://www.ops.rsu.ru

[11] Prangishvili I. V., Vilenkin S. Ya., Medvedev I. L., Parallelnye vychislitelnye sistemy s obschim upravleniem, Energoatomizdat, M., 1983 | Zbl

[12] Savelev V. A., “Ob optimizatsii rasparallelivaniya vychislenii tipa pryamogo metoda v zadache teploprovodnosti dlya sistem s raspredelennoi pamyatyu”, Izvestiya vuzov. Severo-kavkazskii region. Estestvennye nauki, 2012, no. 4, 12–14 | MR

[13] Shteinberg B. Ya., “Beskonfliktnye razmescheniya massivov pri parallelnykh vychisleniyakh”, Kibernetika i sistemnyi analiz, 1999, no. 1, 166–178

[14] Shteinberg B. Ya., “Blochno rekursivnoe parallelnoe peremnozhenie matrits”, Izvestiya vuzov. Priborostroenie, 52:10 (2009), 33–41

[15] Shteinberg B. Ya., Optimizatsiya razmescheniya dannykh v parallelnoi pamyati, Izd-vo Yuzhnogo federalnogo universiteta, Rostov-na-Donu, 2010

[16] Shteinberg B. Ya., “Blochno-affinnye razmescheniya dannykh v parallelnoi pamyati”, Informatsionnye tekhnologii, 2010, no. 6, 36–41

[17] Shteinberg B. Ya., “Blochno rekurrentnoe razmeschenie matritsy dlya parallelnogo vypolneniya algoritma Floida”, Izvestiya vuzov. Severo-kavkazskii region. Estestvennye nauki, 2010, no. 5, 31–33 | Zbl

[18] Shteinberg B. Ya., “Zavisimost optimalnogo raspredeleniya ploschadi kristalla protsessora mezhdu pamyatyu i vychislitelnymi yadrami ot algoritma”, Tr. VI Mezhdunar. konf. “Parallelnye vychisleniya i zadachi upravleniya”, PACO' 2012 (24–26 oktyabrya 2012 g.), Izd-vo IPU im. V. A. Trapeznikova RAN, M., 2012, 99–108

[19] Shteinberg B. Ya., Yurushkin M. V., “Novym protsessoram – novye kompilyatory”, Otkrytye sistemy, 2013, no. 1 http://www.osp.ru/os/2013/01/13033990/

[20] Eisymont L. K., Gorbunov V. S., “Na puti k ekzaflopsnomu superkompyuteru: rezultaty, napravleniya, tendentsii”, Tr. Tretego Moskovskogo superkompyuternogo foruma, Moskva, 2013 http://www.osp.ru/docs/mscf/mscf-001.pdf

[21] Denning P. J., “The locality principle”, Communications of the ACM, 48:7 (2005), 19–24 | DOI

[22] DVM system, URL: http://www.keldysh.ru/dvm/

[23] Frigo M., Leiserson C. E., Prokop H., Ramachandran S., “Cache-oblivious algorithms”, Proc. of the 40th IEEE Symposium on Foundations of Computer Science, FOCS 99, New York City, 1999, 285–297

[24] Herrero Jose R., Navarro Juan J., “Using non-canonical array layouts in dense matrix operations”, Applied Parallel Computing. State of the Art in Scientific Computing, 8th I. Workshop, PARA 2006, Lect. Notes in Computer Science, 4699, Springer-Verlag, Berlin etc., 2007, 580–588 | DOI

[25] Kulkurni D., Stumm M., Loop and Data Transformations: A Tutorial, Technical Report CSRI 337, Computer Systems Research Institute, University of Toronto, Toronto, 1993

[26] Kazushige Goto, Robert A. van de Geijn, “Anatomy of high-performance matrix multiplication”, ACM Trans. Math. Softw., 34:3 (2008), Art. 12, 25 pp. | MR

[27] Message Passing Interface (MPI) Forum Home Page, URL: http://www.mpi-forum.org/

[28] Nikos Chrisochoides, Mokhtar Aboelaze, Elias Houstis, Catehrine Houstis, Scalable BLAS 2 and 3 Matrix Multiplication for Sparse Banded Matrices on Distributed Memory MIMD Machines, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.9748

[29] PARALLEL.RU. Klasternye ustanovki Rossii i SNG, URL: http://parallel.ru/russia/russian_clusters.html#infini

[30] Park N., Hong B., Prasanna V. K., “Tiling, block data layout, and memory hierarchy performance”, IEEE transactions on parallel and distributed systems, 14:7 (2003), 640–654 | DOI

[31] PLASMA Users' Guide, Parallel Linear Algebra Software for Multicore Architectures. Version 2.3, University of Tennessee, USA, Knoxville, 2010

[32] SGI – HPC, Servers, Storage, Data Center Solutions, Cloud Computing, URL: http://www.shmem.org/

[33] The ParaWise Automatic Parallelization Environment, URL: http://www.parallelsp.com/parawise.htm

[34] Wolfe M., High Performance Compilers for Parallel Computing, Addison-Wesley Publishing Company, Redwood city, 1996 | Zbl