Conditions for privatizing the elements of arrays by computing threads
Journal of the Belarusian State University. Mathematics and Informatics, Tome 3 (2018), pp. 59-67.

Voir la notice de l'article provenant de la source Math-Net.Ru

The set of operations of the parallel algorithm for implementation on the GPU must be split into computation threads. The threads must be grouped into computation units that run atomically on stream processors, also called multiprocessors. For good GPU performance, it is important that as much data as possible can fit into fast register and shared memory, otherwise slow global and local memory are used. The degree of memory usage with fast access reflects the computational property of the algorithm, called locality. When implementing algorithms on multiprocessor computing devices, the use of locality plays a crucial role in achieving high performance. In this paper, necessary conditions and sufficient conditions have been formulated and proved, the use of which allows receiving threads with privatized data, i. e. it allows to receive such computation threads that the array element is used only by one thread and therefore it is advisable to place it in the register
Keywords: parallel computations; GPU; tiling; array privatization; registers.
@article{BGUMI_2018_3_a6,
     author = {N. A. Likhoded and M. A. Paliashchuk},
     title = {Conditions for privatizing the elements of arrays by computing threads},
     journal = {Journal of the Belarusian State University. Mathematics and Informatics},
     pages = {59--67},
     publisher = {mathdoc},
     volume = {3},
     year = {2018},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/BGUMI_2018_3_a6/}
}
TY  - JOUR
AU  - N. A. Likhoded
AU  - M. A. Paliashchuk
TI  - Conditions for privatizing the elements of arrays by computing threads
JO  - Journal of the Belarusian State University. Mathematics and Informatics
PY  - 2018
SP  - 59
EP  - 67
VL  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/BGUMI_2018_3_a6/
LA  - ru
ID  - BGUMI_2018_3_a6
ER  - 
%0 Journal Article
%A N. A. Likhoded
%A M. A. Paliashchuk
%T Conditions for privatizing the elements of arrays by computing threads
%J Journal of the Belarusian State University. Mathematics and Informatics
%D 2018
%P 59-67
%V 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/BGUMI_2018_3_a6/
%G ru
%F BGUMI_2018_3_a6
N. A. Likhoded; M. A. Paliashchuk. Conditions for privatizing the elements of arrays by computing threads. Journal of the Belarusian State University. Mathematics and Informatics, Tome 3 (2018), pp. 59-67. http://geodesic.mathdoc.fr/item/BGUMI_2018_3_a6/

[1] Vl.\.,V. Voevodin, Vad. V. Voevodin, “Spasitelnaya lokalnost superkompyuterov”, Otkrytye sistemy, 9 (2013), 12–15

[2] A. Buluc, J. R. Gilberta, C. Budak, “Solving path problems on the GPU”, Parallel Computing, 36(5–6) (2010), 241–253 | DOI | Zbl

[3] N. A. Likhoded, M. A. Poleschuk, “Otsenka lokalnosti parallelnykh algoritmov, realizuemykh na graficheskikh protsessorakh”, Vestnik Yuzhno-Uralskogo gosudarstvennogo universiteta. Vychislitelnaya matematika i informatika, 5(3) (2016), 96–111 | DOI

[4] N. A. Likhoded, M. A. Poleschuk, “Metod ranzhirovaniya parametrov razmera blokov vychislenii parallelnogo algoritma”, Doklady Natsionalnoi akademii nauk Belarusi, 59(4) (2015), 25–33 | MR | Zbl

[5] M. Kandemir, J. Ramanujam, M. Irwin, V. Narayanan, I. Kadayif, A. Parikh, “A compiler based approach for dynamically managing scratch-pad memories in embedded systems”, IEEE Transactions on Computer-Aided Design, 23(2) (2004), 243–260 | DOI

[6] M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan, “Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories”, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming (Salt Lake City, USA), 2008, 1–10, New York: ACM | DOI

[7] M. A. Poleschuk, N. A. Likhoded, “Privatizatsiya elementov massivov potokami vychislenii”, SIST’2016. Mezhdunarodnyi kongress po informatike: informatsionnye sistemy i tekhnologii (Belarus), 2016, 883–888, Minsk: BGU

[8] M. Baskaran, J. Ramanujam, P. Sadayappan, “Automatic C-to-CUDA code generation for affine programs”, Compiler Construction. 19th International Conference. Part of the Joint European Conferences on Theory and Practice of Software (Paphos, Cyprus), 2010, Berlin, Heidelberg: Springer-Verlag | DOI

[9] G. J. Katz, J. Kider, “All-pairs shortest-paths for large graphs on the GPU”, Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware (Sarajevo, Bosnia and Herzegovina), 2008, 47–55, Geneve: Eurographics Association

[10] B. D. Lund, . Smith, “A Multi-Stage {CUDA} Kernel for Floyd-Warshall”, CoRR [Internet], 2010, arXiv: http://arxiv.org/abs/1001.4108

[11] E. V. Adutskevich, N. A. Likhoded, A. O. Sikorskii, “K rasparallelivaniyu posledovatelnykh programm: raspredelenie massivov mezhdu protsessorami i strukturizatsiya kommunikatsii”, Kibernetika i sistemnyi analiz, 48(1) (2012), 144–163 | MR | Zbl

[12] V. V. Voevodin, “Informatsionnaya struktura algoritmov”, Moskva: MGU, 1997 | Zbl

[13] N. A. Likhoded, A. A. Tolstikov, “Formalizatsiya kommunikatsionnykh operatsii mnogomernykh tsiklov”, Izvestiya Natsionalnoi akademii nauk Belarusi. Seriya fiziko-matematicheskikh nauk, 3 (2010), 109–114