The streaming processing of SAR data in distributed environment with Apache Spark
Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 13 (2017) no. 2, pp. 168-181
Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

This article presents a modern approach to creating a distributed program complex based on mass-parallel technology for pre- and postprocessing of SAR images. The unique features of the system is the ability to work in real time mode with huge amounts of streaming data and applying existing algorithms that are not used for distributed processing on multiple nodes without changing the algorithms' implementation. A comparison has been made of distributed processing technologies based on which we have selected Apache Spark. The ability to organise automatic processing of input SAR images as a sequence of operations which should be performed based on defined conditions is demonstrated. The results of processing store in the system as fault tolerant distributed collections of data (RDD-Resilient Distributed Data), which allows getting and saving the intermediate results in the distributed file system HDFS as and when new space images became available and processed by the sequence of algorithms. This article described the implementation for the specific tasks of SAR data processing based on the suggested approach is described (phase estimation, coregistration, interferogram creation and phase unwrapping with region growing method). A scheme of the phase unwrapping algorithm with the ability to use GPU and NVIDIA CUDA technology is presented. An adaptation of the algorithm for the mass-parallel systems is shown. The algorithm implementation focused on processing pair of SAR images on one node. Performance growth is achieved by simultaneous processing multiple images whose number is equal to cluster nodes count. An example of methods implementation for working with streaming binary data (BinaryRecordStream) which perform monitoring of new SAR data in distributed file system HDFS and reading$\backslash$writing this data as binary files with fixed bytes size is shown. A directory and size of one record are used as the input parameters. The results of testing developed algorithms on demonstration cluster is presented. A possibility of getting up to eight times better processing speed using eight nodes in a cluster for the same images count in comparison with sequential processing on one node is shown. Results of testing provide the ability to improve the performance of presented algorithms without any changes in implementation and this in turn justifies the utility of applying distributed approach for SAR data processing. Refs 26. Figs 4. Tables 3.
Keywords: Apache Spark, Apache Hadoop, distributed information systems, sar interfometry, processing algorithms.
@article{VSPUI_2017_13_2_a3,
     author = {V. P. Potapov and M. A. Kostylev and S. E. Popov},
     title = {The streaming processing of {SAR} data in distributed environment with {Apache} {Spark}},
     journal = {Vestnik Sankt-Peterburgskogo universiteta. Prikladna\^a matematika, informatika, processy upravleni\^a},
     pages = {168--181},
     year = {2017},
     volume = {13},
     number = {2},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/VSPUI_2017_13_2_a3/}
}
TY  - JOUR
AU  - V. P. Potapov
AU  - M. A. Kostylev
AU  - S. E. Popov
TI  - The streaming processing of SAR data in distributed environment with Apache Spark
JO  - Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
PY  - 2017
SP  - 168
EP  - 181
VL  - 13
IS  - 2
UR  - http://geodesic.mathdoc.fr/item/VSPUI_2017_13_2_a3/
LA  - ru
ID  - VSPUI_2017_13_2_a3
ER  - 
%0 Journal Article
%A V. P. Potapov
%A M. A. Kostylev
%A S. E. Popov
%T The streaming processing of SAR data in distributed environment with Apache Spark
%J Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
%D 2017
%P 168-181
%V 13
%N 2
%U http://geodesic.mathdoc.fr/item/VSPUI_2017_13_2_a3/
%G ru
%F VSPUI_2017_13_2_a3
V. P. Potapov; M. A. Kostylev; S. E. Popov. The streaming processing of SAR data in distributed environment with Apache Spark. Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 13 (2017) no. 2, pp. 168-181. http://geodesic.mathdoc.fr/item/VSPUI_2017_13_2_a3/

[1] Elizavetin I. V., Shuvalov R. I., Bush V. A., “Principles and methods of SAR Interferometry for the purpose of forming a digital elevation model”, Geodesy and cartography, 2009, no. 1, 39–45 (In Russian)

[2] Ferretti A., Monti-Guarnieri A., Prati C. et al., InSAR Principles: Guidelines for SAR interferometry processing and interpretation, (accessed: 02.08.2016) http://www.esa.int/esapub/tm/tm19/TM-19_ptA.pdf

[3] Zhengxiao Li, Bethel J., “Image coregistration in SAR interferometry”, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Beijing, XXXVII:B1 (2008), 433–438

[4] Massonnet D., Feigl K. L., “Radar interferometry and its application to changes in the earth's surface”, Reviews of Geophysics, 36:4 (1998), 441–500 | DOI

[5] Costantini M., Farina A., Zirilli F., “A fast phase unwrapping algorithm for SAR interferometry”, IEEE Trans. GARS, 37:1 (1999), 452–460 | MR

[6] Mistry P., Braganza S., Kaeli D., Leeser M., “Accelerating phase unwrapping and affine transformations for optical quadrature microscopy using CUDA”, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. GPGPU: Conference, ACM, Washington, D.C., USA, 2009, 28–37 | DOI

[7] Karasev P. A., Campbell D. P., Richards M. A., “Obtaining a 35x Speedup in 2D phase unwrapping using commodity graphics processors”, Radar Conference, IEEE, 2007, 574–578

[8] Wu Z., Ma W., Long G., Li Y., Tang Q., Wang Z., “High performance two-dimensional phase unwrapping on GPUs”, Proceedings of the 11th ACM Conference on Computing Frontiers — CF'14, ACM, New York, NY, USA, 2014, 35:1–35:10

[9] Xin-Liang S., Xiao-Chun X., “GPU acceleration of range alignment based on minimum entropy criterion”, Radar Conference, IET International (14–16 April 2013), 1–4

[10] Guerriero A., Anelli V. W., Pagliara A., Nutricato R., Nitti D. O., “High performance GPU implementation of InSAR time-consuming algorithm kernels”, Proceedings of the 1st WORKSHOP on the State of the art and Challenges of Research Efforts at POLIBA, Politecnico di Bari, Bari, Italy, 2014, 383

[11] Zhang F., Wang B., Xiang M., “Accelerating InSAR raw data simulation on GPU using CUDA”, Geoscience and Remote Sensing Symposium (IGARSS), IEEE International (25–30 July 2010), Politecnico di Bari, Bari, Italy, 2932–2935

[12] Marinkovic P. S., Hanssen R. F., Kampes B. M., “Utilization of parallelization algorithms in InSAR/PS-InSAR processing”, Proceedings of the 2004 Envisat ERS Symposium (ESA SP-572) (6–10 September 2004), ESA, Salzburg, Austria, 1–7

[13] Sheng G., Qi-Ming Z., Jian J., Cun-Ren L., Qing-xi T., “Parallel processing of InSAR interferogram filtering with CUDA programming”, Zhongguo Cehui Kexue Yanjiuyan, China, 40:1 (2015), 67–88

[14] Verba V. S., Neronskij L. B., Osipov I. G.,Turuk V. Je., Space-based radio location systems of Earth Observation, Radiotechnica Publ., M., 2010, 675 pp. (In Russian)

[15] Gabriel E., Fagg G. E., Bosilca G. et al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, (accessed: 30.06.2016) https://www.open-mpi.org/papers/euro-pvmmpi-2004-overview/euro-pvmmpi-2004-overview.pdf | MR

[16] Kampes B., Hanssen R., Perski Z., Radar Interferometry with Public Domain Tools presentation, (accessed: 30.06.2016) http://doris.tudelft.nl/Literature/kampes_fringe03.pdf

[17] Frigo M., Johnson S. G., “FFTW: An Adaptive Software Architecture for the FFT”, ICASSP conference proceedings, v. 3, IEEE, Seattle, Washington, USA, 15 May 1998, 1381–1384

[18] Larkin J., Fast GPU Development with CUDA Libraries, (accessed: 30.06.2016) https://www.olcf.ornl.gov/wp-content/uploads/2013/02/GPU_libraries-JL.pdf

[19] Demmel J., Dongarra J., ST-HEC: Reliable and scalable software for linear algebra computations on High End Computers, (accessed: 30.06.2016) https://people.eecs.berkeley.edu/d̃emmel/Sca-LAPACK-Proposal.pdf

[20] Feoktistov A. A., Zaharov A. I., Gusev M. A., Denisov P. V., “Investigation of the possibilities of the method of small baselines technique on the example of the SBaS module of the software package SARScape and the data of the RSA ASAR/ENVISat and PALSAR/ALOS. Pt 1. Key points of the method”, Journal of Radioelectronics, 2015, no. 9, 1–26 (In Russian)

[21] Reyes-Ortiz J. L., Oneto L., Anguita D., “Big Data analytics in the cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf”, INNS Conference on Big Data 2015 Program (San Francisco, USA, 8–10 August 2015), 121–130

[22] Kannan P., Beyond Hadoop MapReduce Apache Tez and Apache Spark, (accessed: 02.08.2016) http://www.sjsu.edu/people/robert.chun/courses/CS259Fall2013/s3/F.pdf

[23] Nathan P., Real-Time analytics with Spark Streaming, (accessed: 02.08.2016) http://viva-lab.ece.virginia.edu/foswiki/pub/InSAR/RitaEducation/InSAR Technology Literature Search.pdf

[24] Nagler E., Introduction to Oozie, Apache Oozie Documentation, (accessed: 02.08.2016) http://www.cse.buffalo.edu/bina/cse487/fall2011/Oozie.pdf

[25] Jhajj R., Apache Hadoop Hue Tutorial, (accessed: 02.08.2016) https://examples.javacodegeeks.com/enterprise-java/apache-hadoop/apache-hadoop-hue-tutorial/

[26] Potapov V. P., Popov S. E., “High-Performance Region-Growing Algorithm for InSAR Phase Unwrapping Based on CUDA”, Software engineering, 2016, no. 2, 61–74 (In Russian)