Delay-Aware Resource-Efficient Interleaved Task Scheduling Strategy in Spark
Computer Science and Information Systems, Tome 22 (2025) no. 3

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

For solving the low CPU and network resource utilization in the task scheduler process of the Spark and Flink computing frameworks, this paper proposes a Delay-Aware Resource-Efficient Interleaved Task Scheduling Strategy(RPTS). This algorithm can schedule parallel tasks in a pipelined fashion, effectively improving the system resource utilization and shortening the job completion times. Firstly, based on historical data of task completion times, we stagger the execution of tasks within the stage with the longest completion time. This helps optimize the utilization of system resources and ensures the smooth completion of the entire pipeline job. Secondly, the execution tasks are categorized into CPU-intensive and non-CPU-intensive phases, which include network I/O and disk I/O operations. During the non-CPU-intensive phase where tasks involve data fetch, parallel tasks are scheduled at suitable intervals to mitigate resource contention and minimize job completion time. Finally, we implemented RPTS on Spark 2.4.0 and conducted experiments to evaluate its performance. The results show that compared to DelayStage, RPTS reduces job execution time by 3.18% to 6.48% and improves CPU and network utilization of the cluster by 6.33% and 7.02%, respectively.
Keywords: Job execution time, delay-aware, Spark, task scheduler
Yanhao Zhang; Congyang Wang; Xin He; Junyang Yu; Rui Zhai; Yalin Song. Delay-Aware Resource-Efficient Interleaved Task Scheduling Strategy in Spark. Computer Science and Information Systems, Tome 22 (2025) no. 3. http://geodesic.mathdoc.fr/item/CSIS_2025_22_3_a8/
@article{CSIS_2025_22_3_a8,
     author = {Yanhao Zhang and Congyang Wang and Xin He and Junyang Yu and Rui Zhai and Yalin Song},
     title = {Delay-Aware {Resource-Efficient} {Interleaved} {Task} {Scheduling} {Strategy} in {Spark}},
     journal = {Computer Science and Information Systems},
     year = {2025},
     volume = {22},
     number = {3},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2025_22_3_a8/}
}
TY  - JOUR
AU  - Yanhao Zhang
AU  - Congyang Wang
AU  - Xin He
AU  - Junyang Yu
AU  - Rui Zhai
AU  - Yalin Song
TI  - Delay-Aware Resource-Efficient Interleaved Task Scheduling Strategy in Spark
JO  - Computer Science and Information Systems
PY  - 2025
VL  - 22
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/CSIS_2025_22_3_a8/
ID  - CSIS_2025_22_3_a8
ER  - 
%0 Journal Article
%A Yanhao Zhang
%A Congyang Wang
%A Xin He
%A Junyang Yu
%A Rui Zhai
%A Yalin Song
%T Delay-Aware Resource-Efficient Interleaved Task Scheduling Strategy in Spark
%J Computer Science and Information Systems
%D 2025
%V 22
%N 3
%U http://geodesic.mathdoc.fr/item/CSIS_2025_22_3_a8/
%F CSIS_2025_22_3_a8