Generation of a social network graph by using Apache Spark
Modelirovanie i analiz informacionnyh sistem, Tome 23 (2016) no. 6, pp. 777-783.

Voir la notice de l'article provenant de la source Math-Net.Ru

We plan to create a method of clustering a social network graph. For testing the method there is a need to generate a graph similar in structure to existing social networks. The article presents an algorithm for the graph distributed generation. We took into account basic properties such as power-law distribution of the users communities number, dense intersections of the social networks and others. This algorithm also considers the problems that are present in similar works of other authors, for example, the multiple edges problem in the generation process. A special feature of the created algorithm is the implementation depending on the communities number parameter rather than on the connected users number as it is done in other works. It is connected with a peculiarity of progressing the existing social network structure. There are properties of its graph in the paper. We described a table containing the variables needed for the algorithm. A step-by-step generation algorithm was compiled. Appropriate mathematical parameters were calculated for it. A generation is performed in a distributed way by Apache Spark framework. It was described in detail how the tasks division with the help of this framework runs. The Erdos–Renyi model for random graphs is used in the algorithm. It is the most suitable and easy one to implement. The main advantages of the created method are the small amount of resources in comparison with other similar generators and execution speed. Speed is achieved through distributed work and the fact that in any time network users have their own unique numbers and are ordered by these numbers, so there is no need to sort them out. The designed algorithm will promote not only the efficient clustering method creation. It can be useful in other development areas connected, for example, with the social networks search engines.
Keywords: social network, generation.
@article{MAIS_2016_23_6_a8,
     author = {Yu. A. Belov and S. I. Vovchok},
     title = {Generation of a social network graph by using {Apache} {Spark}},
     journal = {Modelirovanie i analiz informacionnyh sistem},
     pages = {777--783},
     publisher = {mathdoc},
     volume = {23},
     number = {6},
     year = {2016},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MAIS_2016_23_6_a8/}
}
TY  - JOUR
AU  - Yu. A. Belov
AU  - S. I. Vovchok
TI  - Generation of a social network graph by using Apache Spark
JO  - Modelirovanie i analiz informacionnyh sistem
PY  - 2016
SP  - 777
EP  - 783
VL  - 23
IS  - 6
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MAIS_2016_23_6_a8/
LA  - ru
ID  - MAIS_2016_23_6_a8
ER  - 
%0 Journal Article
%A Yu. A. Belov
%A S. I. Vovchok
%T Generation of a social network graph by using Apache Spark
%J Modelirovanie i analiz informacionnyh sistem
%D 2016
%P 777-783
%V 23
%N 6
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MAIS_2016_23_6_a8/
%G ru
%F MAIS_2016_23_6_a8
Yu. A. Belov; S. I. Vovchok. Generation of a social network graph by using Apache Spark. Modelirovanie i analiz informacionnyh sistem, Tome 23 (2016) no. 6, pp. 777-783. http://geodesic.mathdoc.fr/item/MAIS_2016_23_6_a8/

[1] Chikhradze K. K. et al., “On a model of social network with user communities for distributed generation of random social graphs”, Machine Learning and Data Analysis, 1:8 (2014), 1027–1047 (in Russian)

[2] Yang J., Leskovec J., “Community-affiliation graph model for overlapping network community detection”, IEEE 12th Conference (International) on Data Mining, 2012

[3] Raigorodskii A., “Models of Random Graphs and Their Applications to the Web-Graphs Analysis”, RUSSIR-2015 (Lomonosov Moscow University, Moscow Istitute of Physics and Technology, Yandex, Moscow, Russia, 25–29 August, 2015)

[4] Erdos P., Renyi A., “On the evolution of random graphs”, Bull. Inst. Int. Statist. Tokyo, 38 (1961), 343–347 | MR | Zbl

[5] Aiello W., Chung F., Lu L., “On the evolution of random graphs”, A random graph model for power law graphs http://people.math.sc.edu/lu/papers/power.pdf

[6] Karau K. et al., Learning Spark: Lightning-Fast Data Analysis, DMK Press, M., 2015, 304 pp. (in Russian)

[7] Vovchok S. I., “Sozdanie metoda klasterizatsii grafa sotsialnoy seti”, Novye informatsionnye tekhnologii v nauke, Sbornik statey Mezhdunarodnoy nauchno-prakticheskoy konferentsii MTsII OMEGA SAYNS, v. 2, 2016, 34–36 (in Russian)