On efficient randomized algorithms for finding the PageRank vector

A. V. Gasnikov; D. Yu. Dmitriev

A. V. Gasnikov ; D. Yu. Dmitriev

Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki, Tome 55 (2015) no. 3, pp. 355-371

Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

Résumé

Two randomized methods are considered for finding the PageRank vector; in other words, the solution of the system $\mathbf{p}^{\mathrm{T}}=\mathbf{p}^{\mathrm{T}}P$ with a stochastic $n\times n$ matrix $P$, where $n\sim 10^7$–$10^9$, is sought (in the class of probability distributions) with accuracy $\varepsilon:\varepsilon\gg n^{-1}$. Thus, the possibility of brute-force multiplication of $P$ by the column is ruled out in the case of dense objects. The first method is based on the idea of Markov chain Monte Carlo algorithms. This approach is efficient when the iterative process $\mathbf{p}_{t+1}^{\mathrm{T}}=\mathbf{p}_t^{\mathrm{T}}P$ quickly reaches a steady state. Additionally, it takes into account another specific feature of $P$, namely, the nonzero off-diagonal elements of $P$ are equal in rows (this property is used to organize a random walk over the graph with the matrix $P$). Based on modern concentration-of-measure inequalities, new bounds for the running time of this method are presented that take into account the specific features of $P$. In the second method, the search for a ranking vector is reduced to finding the equilibrium in the antagonistic matrix game $$ \min_{\mathbf{p}\in S_n(1)}\max_{\mathbf{u}\in S_n(1)}\langle \mathbf{u}, (P^{\mathrm{T}}-I)\mathbf{p}\rangle, $$ where $S_n(1)$ is a unit simplex in $\mathbb{R}^n$ and $I$ is the identity matrix. The arising problem is solved by applying a slightly modified Grigoriadis–Khachiyan algorithm (1995). This technique, like the Nazin–Polyak method (2009), is a randomized version of Nemirovski’s mirror descent method. The difference is that randomization in the Grigoriadis–Khachiyan algorithm is used when the gradient is projected onto the simplex rather than when the stochastic gradient is computed. For sparse matrices $P$, the method proposed yields noticeably better results.

Export
Comment citer

@article{ZVMMF_2015_55_3_a0,
     author = {A. V. Gasnikov and D. Yu. Dmitriev},
     title = {On efficient randomized algorithms for finding the {PageRank} vector},
     journal = {\v{Z}urnal vy\v{c}islitelʹnoj matematiki i matemati\v{c}eskoj fiziki},
     pages = {355--371},
     year = {2015},
     volume = {55},
     number = {3},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/ZVMMF_2015_55_3_a0/}
}

TY  - JOUR
AU  - A. V. Gasnikov
AU  - D. Yu. Dmitriev
TI  - On efficient randomized algorithms for finding the PageRank vector
JO  - Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki
PY  - 2015
SP  - 355
EP  - 371
VL  - 55
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/ZVMMF_2015_55_3_a0/
LA  - ru
ID  - ZVMMF_2015_55_3_a0
ER  -

%0 Journal Article
%A A. V. Gasnikov
%A D. Yu. Dmitriev
%T On efficient randomized algorithms for finding the PageRank vector
%J Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki
%D 2015
%P 355-371
%V 55
%N 3
%U http://geodesic.mathdoc.fr/item/ZVMMF_2015_55_3_a0/
%G ru
%F ZVMMF_2015_55_3_a0

A. V. Gasnikov; D. Yu. Dmitriev. On efficient randomized algorithms for finding the PageRank vector. Žurnal vyčislitelʹnoj matematiki i matematičeskoj fiziki, Tome 55 (2015) no. 3, pp. 355-371. http://geodesic.mathdoc.fr/item/ZVMMF_2015_55_3_a0/

Bibliographie
Cité par

[1] Brin S., Page L., “The anatomy of a large-scale hypertextual web search engine”, Comput. Network ISDN Syst., 30:1–7 (1998), 107–117

[2] Langville A. N., Meyer C. D., Google's PageRank and beyond: the science of search engine rankings, Princeton University Press, New York, 2006 | MR | Zbl

[3] Baldi P., Frasconi P., Smyth P., Modeling the Internet and the Web: probabilistic methods and algorithms, John Wiley Sons, 2003 http://ibook.ics.uci.edu/

[4] Franceschet M., “PageRank: standing on the shoulders of giant”, Commun. of ACM, 54:6 (2011), 92–101, arXiv: 1002.2858v3

[5] Gasnikov A. V., Gasnikova E. V., Fedko O. S., “O vozmozhnoi dinamike v modeli ranzhirovaniya web-stranits PageRank i modernizirovannoi modeli rascheta matritsy korrespondentsii”, Tr. MFTI, 4:2(14) (2012), 101–120

[6] Agaev R. P., Chebotarev P. Yu., “Skhodimost i ustoichivost v zadachakh soglasovaniya kharakteristik (obzor bazovykh rezultatov)”, Upravlenie bolshimi sistemami, 30:1 (2010), 470–505 | MR

[7] Levin D. A., Peres Y., Wilmer E. L., Markov chain and mixing times, AMS, 2009 http://pages.uoregon.edu/dlevin/MARKOV/markovmixing.pdf | MR

[8] Nazin A. V., Polyak B. T., “Randomizirovannyi algoritm nakhozhdeniya sobstvennogo vektora stokhasticheskoi matritsy s primeneniem k zadache PageRank”, Avtomatika i telemekhan., 2011, no. 2, 131–141 | MR | Zbl

[9] Nesterov Y. E., Subgradient methods for huge-scale optimization problems, CORE Discussion Paper, 2012/2, , 2012 http://dial.academielouvain.be/handle/boreal:107876

[10] Nesterov Y. E., “Efficiency of coordinate descent methods on large scale optimization problem”, SIAM J. Optimizat., 22:2 (2012), 341–362 http://dial.academielouvain.be/handle/boreal:121612 | MR | Zbl

[11] Juditsky A., Lan G., Nemirovski A., Shapiro A., “Stochastic approximation approach to stochastic programming”, SIAM J. Optimizat., 19:4 (2009), 1574–1609 | MR | Zbl

[12] Khachiyan L. G., Izbrannye trudy, Sost. S. P. Tarasov, MTsNMO, M., 2009, 38–48

[13] Nesterov Y., Nemirovski A., Finding the stationary states of Markov chains by iterative methods, CORE Discussion Paper, 2012/58, , 2012 http://dial.academielouvain.be/handle/boreal:122163 | MR

[14] Polyak B. T., Tremba A. A., “Reshenie zadachi PageRank dlya bolshikh matrits s pomoschyu regulyarizatsii”, Avtomatika i telemekhan., 2012, no. 11, 144–166 | MR | Zbl

[15] Spielman D., Lecture No 7, , 2009 http://www.cse.cuhk.edu.hk/ chi/csc5160/notes/L07.pdf

[16] Kashin B. S., “Poperechniki nekotorykh konechnomernykh mnozhestv i klassov gladkikh funktsii”, Izv. AN SSSR. Ser. Matem., 41:2 (1977), 334–351 | MR

[17] Pandurangan G., Raghavan P., Upfal E., “Using PageRank to characterize Web structure”, Internet Math., 3:1 (2006), 1–20 | MR | Zbl

[18] Bakhvalov N. S., Zhidkov N. P., Kobelkov G. M., Chislennye metody, Fizmatlit. BINOM. Laboratoriya bazovykh znanii, M.; Nevskii dialekt, SPb., 2002

[19] Nikaido Kh., Vypuklye struktury i matematicheskaya ekonomika, Mir, M., 1972

[20] Ryabenkii V. S., Vvedenie v vychislitelnuyu matematiku, Fizmatlit, M., 2000 | MR

[21] Chebotarev P., Agaev R., “Forest matrices around the Laplacian matrix”, Linear Algebra and Applicat., 356 (2002), 253–274 | MR | Zbl

[22] Faddeev D. K., Faddeeva V. N., Vychislitelnye metody lineinoi algebry, Nauka, M., 1963 | MR

[23] Krasnoselskii M. A., Krein S. G., “Zamechanie o raspredelenii oshibok pri reshenii sistemy lineinykh uravnenii pri pomoschi iteratsionnogo protsessa”, UMN, 7:4(50) (1952), 157–161 | MR | Zbl

[24] Khorn R., Dzhonson Ch., Matrichnyi analiz, Mir, M., 1989 | MR

[25] Ermakov S. M., Metod Monte-Karlo v vychislitelnoi matematike. Vvodnyi kurs, BINOM. Laboratoriya bazovykh znanii, M.; Nevskii dialekt, SPb., 2009

[26] Raigorodskii A. M., Modeli interneta, Izdatelskii dom “Intellekt”, Dolgoprudnyi, 2013

[27] Knut D., Yao E., “Slozhnost modelirovaniya neravnomernykh raspredelenii”, Kiberneticheskii sb. Novaya seriya, 19, Mir, M., 1983, 97–158

[28] Sinclair A., Jerrum M., “Approximate counting, uniform generation and rapidly mixing Markov chains”, Information and Comput., 82:1 (1989), 93–133 | MR | Zbl

[29] Dyer M., Frieze A., Kannan R., “A random polynomial-time algorithm for approximating of the volume of convex bodies”, J. ACM, 38:1 (1991), 1–17 | MR | Zbl

[30] Jerrum M., Sinclair A., “The Markov chain Monte Carlo method: an approach to approximate counting and integration”, Approximat. Algorithms for NP-hard Problems, ed. D. S. Hochbaum, PWS Publishing, Boston, 1996, 482–520

[31] Lezaud P., “Chernoff-type bound for finite Markov chains”, The Annals of Applied Probability, 8:3 (1998), 849–867 | MR | Zbl

[32] Aldous D., Fill J., Reversible Markov chains and random walks on graphs, , 2002 http://www.stat.berkeley.edu/ãldous/RWG/book.html

[33] Fan Chung, “Laplacians and the Cheeger inequality for directed graphs”, Annals of Combinatorics, 2005, no. 9, 1–19 http://math.ucsd.edu/f̃an/ | MR | Zbl

[34] Diaconis P., “The Markov chain Monte Carlo revolution”, Bulletin (New Series) of the AMS, 49:2 (2009), 179–205 http://math.uchicago.edu/s̃hmuel/Network-course-readings/MCMCRev.pdf | MR

[35] Spielman D., Lecture No 2, , 2009 http://www.cse.cuhk.edu.hk/c̃hi/csc5160/notes/L02.pdf

[36] Joulin A., Ollivier Y., “Curvature, concentration and error estimates for Markov chain Monte Carlo”, Ann. Prob., 38:6 (2010), 2418–2442 | MR | Zbl

[37] Montenegro R., Tetali P., Mathematical aspects of mixing times in Markov chains, , 2006 http://people.math.gatech.edu/t̃etali/PUBLIS/survey.pdf | MR

[38] Boyd S., Vandenberghe L., Convex optimization, Cambridge University Press, 2004 | MR | Zbl

[39] Meyn S. P., Tweedie R. L., Markov chains and stochastic stability, Springer-Verlag, London, 2005 http://probability.ca/MT/ | MR

[40] Paulin D., Concentration inequalities for Markov chains by Marton couplings, 2013, arXiv: 1212.2015v2

[41] Ledoux M., Concentration of measure phenomenon, Math. Surveys Monogr., 89, Amer. Math. Soc., Providence, RI, 2001 | MR | Zbl

[42] Gasnikov A. V., Gasnikova E. V., “Ob entropiino-podobnykh funktsionalakh, voznikayuschikh v stokhasticheskoi khimicheskoi kinetike pri kontsentratsii invariantnoi mery i v kachestve funktsii Lyapunova dinamiki kvazisrednikh”, Matem. zametki, 93:6 (2013), 816–824

[43] Krasnoselskii M. A., Lifshits E. A., Sobolev A. V., Pozitivnye lineinye sistemy. Metod polozhitelnykh operatorov, Nauka, M., 1985 | MR

[44] Kelbert M. Ya., Sukhov Yu. M., Veroyatnost i statistika v primerakh i zadachakh, v. 2, MTsNMO, M., 2010

[45] Klochkov E. Yu., Parametricheskoe otsenivanie v transportnykh zadachakh, Diplom bakalavra, FUPM MFTI, Dolgoprudnyi, 2013

[46] Spokoiny V., “Parametric estimation. Finite sample theory”, The Annals of Statistics, 40:6 (2012), 2877–2909, arXiv: 1111.3029v5 | MR | Zbl

[47] Sanov I. N., “O veroyatnosti bolshikh otklonenii sluchainykh velichin”, Matem. sb., 42(84):1 (1957), 11–44 | MR | Zbl

[48] Boucheron S., Lugosi G., Massart P., Concentration inequalities: A nonasymptotic theory of independence, Oxford University Press, London, 2013 | MR | Zbl

[49] Romaschenko A., Ekspandery: postroenie i nekotorye prilozheniya, MGU, M.-SPb.; Computer Science Club, 2009 http://www.mccme.ru/ãnromash/courses/expanders2009.pdf

[50] Nemirovskii A. S., Yudin D. B., Slozhnost zadach i effektivnost metodov optimizatsii, Nauka, M., 1979 http://www2.isye.gatech.edu/ñemirovs/Lect_EMCO.pdf | MR

[51] Paulin D., Non-asymptotic confidence interval for MCMC, 2013, arXiv: 1212.2016v4

[52] Gasnikov A. V., Nesterov Yu. E., Spokoinyi V. G., “Ob effektivnosti odnogo metoda randomizatsii zerkalnogo spuska v zadachakh onlain optimizatsii”, Zh. vychisl. matem. i matem. fiz., 2014 (to appear)

[53] Mansour Y., Algorithmic game theory and machine learning, , 2011 http://www.tau.ac.il/m̃ansour/advanced-agt+ml/

[54] Bubeck S., Introduction to online optimization, Lecture Notes, Princeton University, New York, 2011 http://www.princeton.edu/s̃bubeck/BubeckLectureNotes.pdf

[55] Nesterov Y., “Smooth minimization of non-smooth function”, Math. Program. Ser. A, 103:1 (2005), 127–152 | MR | Zbl

[56] Nesterov Yu. E., Vvedenie v vypukluyu optimizatsiyu, MTsNMO, M., 2010

[57] http://www2.isye.gatech.edu/ñemirovs/

[58] Nesterov Y., “Primal-dual subgradient methods for convex problems”, Math. Program. Ser. B, 120 (2009), 221–259 | MR | Zbl

[59] Magaril-Ilyaev G. G., Tikhomirov V. M., Vypuklyi analiz i ego prilozheniya, URSS, M., 2011

[60] Vyugin V. V., Matematicheskie osnovy teorii mashinnogo obucheniya i prognozirovaniya, MTsNMO, M., 2013 http://www.iitp.ru/upload/publications/6256/vyugin1.pdf

[61] Lugosi G., Cesa-Bianchi N., Prediction, learning and games, Cambridge University Press, New York, 2006 | MR | Zbl

[62] Yuditskii A. B., Nazin A. V., Tsybakov A. B., Vayatis N., “Rekurrentnoe agregirovanie otsenok metodom zerkalnogo spuska s usredneniem”, Probl. peredachi informatsii, 41:4 (2005), 78–96 | MR

[63] Juditsky A., Nazin A. V., Tsybakov A. B., Vayatis N., “Gap-free bounds for stochastic multi-armed bandit”, Proc. of the 17$^\mathrm{th}$ World Congress IFAC (Seoul, Korea, July 6–11, 2008), 11560–11563

[64] Andersen S. P., de Palma A., Thisse J.-F., Discrete choice theory of product differentiation, MIT Press, New York–Cambridge, 1992 | MR

[65] Juditsky A., Polyak B., “Robust eigenvector of stochastic matrix with application to PageRang”, Proc. of the 51st IEEE Conference on Decision and Control (December 10–13, 2012, Maui, Hawaii, USA), 3171–3176, arXiv: 1206.4897

[66] Tremba A., Nazin A., “Extension of a saddle point mirror descent algorithm with application to robust PageRank”, Proc. of the 52th IEEE Conference on Decision and Control (December 10–13, 2013, Florence, Italy), 3691–3696

[67] Tempo R., http://staff.polito.it/roberto.tempo/

Parcourir par

Geodesic

Parcourir par