Optimizing Data Locality by Executor Allocation in Spark Computing Environment
Computer Science and Information Systems, Tome 14 (2017) no. 3.

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

Data locality is an important concept in big data processing. Most of the existing research optimized data locality from the aspect of task scheduling. However, as the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation for reduce stage in Spark computing environment. Firstly, we calculate the network distance matrix of executors and formulate an optimal executor allocation problem to minimize the total communication distance. Then, when the network distance between executors satisfies the triangular inequality, an approximate algorithm is proposed; and when the network distance between executors does not satisfy the triangular inequality, a greedy algorithm is proposed. Finally, we evaluate the performance of our algorithms in a practical Spark cluster by using several representative micro-benchmarks (Sort and Join) and macro-benchmarks (PageRank and LDA). Experimental results show that the proposed algorithms can decrease the execution time of tasks for lower data communication.
Keywords: communication distance, data locality, executor allocation, spark frame-work
@article{CSIS_2017_14_3_a3,
     author = {Zhongming Fu and Mengsi He and Zhuo Tang and Yang Zhang},
     title = {Optimizing {Data} {Locality} by {Executor} {Allocation} in {Spark} {Computing} {Environment}},
     journal = {Computer Science and Information Systems},
     publisher = {mathdoc},
     volume = {14},
     number = {3},
     year = {2017},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2017_14_3_a3/}
}
TY  - JOUR
AU  - Zhongming Fu
AU  - Mengsi He
AU  - Zhuo Tang
AU  - Yang Zhang
TI  - Optimizing Data Locality by Executor Allocation in Spark Computing Environment
JO  - Computer Science and Information Systems
PY  - 2017
VL  - 14
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CSIS_2017_14_3_a3/
ID  - CSIS_2017_14_3_a3
ER  - 
%0 Journal Article
%A Zhongming Fu
%A Mengsi He
%A Zhuo Tang
%A Yang Zhang
%T Optimizing Data Locality by Executor Allocation in Spark Computing Environment
%J Computer Science and Information Systems
%D 2017
%V 14
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CSIS_2017_14_3_a3/
%F CSIS_2017_14_3_a3
Zhongming Fu; Mengsi He; Zhuo Tang; Yang Zhang. Optimizing Data Locality by Executor Allocation in Spark Computing Environment. Computer Science and Information Systems, Tome 14 (2017) no. 3. http://geodesic.mathdoc.fr/item/CSIS_2017_14_3_a3/