Automatic program parallelization with block data distribution
Sibirskij žurnal vyčislitelʹnoj matematiki, Tome 18 (2015) no. 1, pp. 41-53 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

This paper discusses several automated methods of acceleration programs. The acceleration is achieved by parallelization and optimization of memory access. Optimization of accesses to RAM is achieved by switching to a block code and block placements arrays. When using a distributed memory, the automated distribution of arrays and array distribution with overlapping are employed. Automation is implemented using the C language with pragmas in Open Parallelizing System. This paper presents the numerical results for linear algebra and mathematical physics. Some features of this demonstration converter have a remote access to the Internet.
@article{SJVM_2015_18_1_a3,
     author = {L. R. Gervich and E. N. Kravchenko and B. Y. Steinberg and M. V. Yurushkin},
     title = {Automatic program parallelization with block data distribution},
     journal = {Sibirskij \v{z}urnal vy\v{c}islitelʹnoj matematiki},
     pages = {41--53},
     year = {2015},
     volume = {18},
     number = {1},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/SJVM_2015_18_1_a3/}
}
TY  - JOUR
AU  - L. R. Gervich
AU  - E. N. Kravchenko
AU  - B. Y. Steinberg
AU  - M. V. Yurushkin
TI  - Automatic program parallelization with block data distribution
JO  - Sibirskij žurnal vyčislitelʹnoj matematiki
PY  - 2015
SP  - 41
EP  - 53
VL  - 18
IS  - 1
UR  - http://geodesic.mathdoc.fr/item/SJVM_2015_18_1_a3/
LA  - ru
ID  - SJVM_2015_18_1_a3
ER  - 
%0 Journal Article
%A L. R. Gervich
%A E. N. Kravchenko
%A B. Y. Steinberg
%A M. V. Yurushkin
%T Automatic program parallelization with block data distribution
%J Sibirskij žurnal vyčislitelʹnoj matematiki
%D 2015
%P 41-53
%V 18
%N 1
%U http://geodesic.mathdoc.fr/item/SJVM_2015_18_1_a3/
%G ru
%F SJVM_2015_18_1_a3
L. R. Gervich; E. N. Kravchenko; B. Y. Steinberg; M. V. Yurushkin. Automatic program parallelization with block data distribution. Sibirskij žurnal vyčislitelʹnoj matematiki, Tome 18 (2015) no. 1, pp. 41-53. http://geodesic.mathdoc.fr/item/SJVM_2015_18_1_a3/

[1] Abu-Khalil Zh. M., Morylev R. I., Shteinberg B. Ya., “Parallelnyi algoritm globalnogo vyravnivaniya s optimalnym ispolzovaniem pamyati”, Sovremennye problemy nauki i obrazovaniya, 2013, no. 1 http://www.science-education.ru/107-8139

[2] Avtomaticheskii rasparallelivatel programm s web-interfeisom, http://ops.opsgroup.ru

[3] Arykov S. B., Malyshkin V. E., “Sistema asinkhronnogo parallelnogo programmirovaniya ‘Aspekt’ ”, Vychislitelnye metody i programmirovanie, 9:1 (2008), 205–209

[4] Valkovskii V. A., “Parallelnoe vypolnenie tsiklov. Metod parallelepipedov”, Kibernetika, 1982, no. 2, 51–62

[5] Valkovskii V. A., “Parallelnoe vypolnenie tsiklov. Metod piramid”, Kibernetika, 1983, no. 5, 51–55 | MR

[6] Valkovskii V. A., Rasparallelivanie algoritmov i programm. Strukturnyi podkhod, Radio i svyaz, M., 1989 | MR

[7] Korneev V. V., “Problemy programmirovaniya superkompyuterov na baze mnogoyadernykh multitredovykh kristallov”, Nauchnyi servis v seti Internet: masshtabiruemost, parallelnost, effektivnost, Tr. Vserossiiskoi superkompyuternoi konferentsii (21–26 sentyabrya 2009 g., Novorossiisk), Izd-vo MGU, M., 2009

[8] Linev A. V., Bogolyubov D. K., Bastrakov S. I., Tekhnologii parallelnogo programmirovaniya dlya protsessorov novykh arkhitektur, Uchebnik, Seriya “Superkompyuternoe obrazovanie”, ed. V. P. Gergel, Izd-vo Moskovskogo universiteta, M., 2010

[9] Likhoded N. A., “Raspredelenie operatsii i massivov dannykh mezhdu protsessorami”, Programmirovanie, 2003, no. 3, 73–80 | MR | Zbl

[10] Optimiziruyuschaya rasparallelivayuschaya sistema, http://www.ops.rsu.ru

[11] Prangishvili I. V., Vilenkin S. Ya., Medvedev I. L., Parallelnye vychislitelnye sistemy s obschim upravleniem, Energoatomizdat, M., 1983 | Zbl

[12] Savelev V. A., “Ob optimizatsii rasparallelivaniya vychislenii tipa pryamogo metoda v zadache teploprovodnosti dlya sistem s raspredelennoi pamyatyu”, Izvestiya vuzov. Severo-kavkazskii region. Estestvennye nauki, 2012, no. 4, 12–14 | MR

[13] Shteinberg B. Ya., “Beskonfliktnye razmescheniya massivov pri parallelnykh vychisleniyakh”, Kibernetika i sistemnyi analiz, 1999, no. 1, 166–178

[14] Shteinberg B. Ya., “Blochno rekursivnoe parallelnoe peremnozhenie matrits”, Izvestiya vuzov. Priborostroenie, 52:10 (2009), 33–41

[15] Shteinberg B. Ya., Optimizatsiya razmescheniya dannykh v parallelnoi pamyati, Izd-vo Yuzhnogo federalnogo universiteta, Rostov-na-Donu, 2010

[16] Shteinberg B. Ya., “Blochno-affinnye razmescheniya dannykh v parallelnoi pamyati”, Informatsionnye tekhnologii, 2010, no. 6, 36–41

[17] Shteinberg B. Ya., “Blochno rekurrentnoe razmeschenie matritsy dlya parallelnogo vypolneniya algoritma Floida”, Izvestiya vuzov. Severo-kavkazskii region. Estestvennye nauki, 2010, no. 5, 31–33 | Zbl

[18] Shteinberg B. Ya., “Zavisimost optimalnogo raspredeleniya ploschadi kristalla protsessora mezhdu pamyatyu i vychislitelnymi yadrami ot algoritma”, Tr. VI Mezhdunar. konf. “Parallelnye vychisleniya i zadachi upravleniya”, PACO' 2012 (24–26 oktyabrya 2012 g.), Izd-vo IPU im. V. A. Trapeznikova RAN, M., 2012, 99–108

[19] Shteinberg B. Ya., Yurushkin M. V., “Novym protsessoram – novye kompilyatory”, Otkrytye sistemy, 2013, no. 1 http://www.osp.ru/os/2013/01/13033990/

[20] Eisymont L. K., Gorbunov V. S., “Na puti k ekzaflopsnomu superkompyuteru: rezultaty, napravleniya, tendentsii”, Tr. Tretego Moskovskogo superkompyuternogo foruma, Moskva, 2013 http://www.osp.ru/docs/mscf/mscf-001.pdf

[21] Denning P. J., “The locality principle”, Communications of the ACM, 48:7 (2005), 19–24 | DOI

[22] DVM system, URL: http://www.keldysh.ru/dvm/

[23] Frigo M., Leiserson C. E., Prokop H., Ramachandran S., “Cache-oblivious algorithms”, Proc. of the 40th IEEE Symposium on Foundations of Computer Science, FOCS 99, New York City, 1999, 285–297

[24] Herrero Jose R., Navarro Juan J., “Using non-canonical array layouts in dense matrix operations”, Applied Parallel Computing. State of the Art in Scientific Computing, 8th I. Workshop, PARA 2006, Lect. Notes in Computer Science, 4699, Springer-Verlag, Berlin etc., 2007, 580–588 | DOI

[25] Kulkurni D., Stumm M., Loop and Data Transformations: A Tutorial, Technical Report CSRI 337, Computer Systems Research Institute, University of Toronto, Toronto, 1993

[26] Kazushige Goto, Robert A. van de Geijn, “Anatomy of high-performance matrix multiplication”, ACM Trans. Math. Softw., 34:3 (2008), Art. 12, 25 pp. | MR

[27] Message Passing Interface (MPI) Forum Home Page, URL: http://www.mpi-forum.org/

[28] Nikos Chrisochoides, Mokhtar Aboelaze, Elias Houstis, Catehrine Houstis, Scalable BLAS 2 and 3 Matrix Multiplication for Sparse Banded Matrices on Distributed Memory MIMD Machines, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.9748

[29] PARALLEL.RU. Klasternye ustanovki Rossii i SNG, URL: http://parallel.ru/russia/russian_clusters.html#infini

[30] Park N., Hong B., Prasanna V. K., “Tiling, block data layout, and memory hierarchy performance”, IEEE transactions on parallel and distributed systems, 14:7 (2003), 640–654 | DOI

[31] PLASMA Users' Guide, Parallel Linear Algebra Software for Multicore Architectures. Version 2.3, University of Tennessee, USA, Knoxville, 2010

[32] SGI – HPC, Servers, Storage, Data Center Solutions, Cloud Computing, URL: http://www.shmem.org/

[33] The ParaWise Automatic Parallelization Environment, URL: http://www.parallelsp.com/parawise.htm

[34] Wolfe M., High Performance Compilers for Parallel Computing, Addison-Wesley Publishing Company, Redwood city, 1996 | Zbl