Application of webometrics methods for analysis and enhancement of academic site structure based on page value criterion
Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 15 (2019) no. 3, pp. 337-352 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

This paper describes a formalized procedure for exploring a site using webometrics methods. The procedure involves gathering details on a site's structure, constructing and exploring the resulting webgraph, defining the correctness criterion, identifying control actions that would improve the structure under the given criterion, testing the correctness criterion on real-world examples and developing recommendations on improving the structure. PageRank is used as a criterion to evaluate the value of web pages. The value is determined by the presence/absence of a link pointing to that page from the homepage of the site. Going by the correctness criterion, valuable pages of a site should have the highest PageRank among all other pages of that site. Control action consists of removing non-valuable directories (and transforming them into independent sites), whose root page has a high PageRank. Experiments are conducted on three faculty sites of major universities in USA, Russia and Nigeria. The approach is shown to be applicable and reasonable in all cases.
Keywords: website, graph, PageRank, universities, data mining, website structure, web harvesting, web mining, URL.
@article{VSPUI_2019_15_3_a3,
     author = {A. M. Nwohiri and A. A. Pechnikov},
     title = {Application of webometrics methods for analysis and enhancement of academic site structure based on page value criterion},
     journal = {Vestnik Sankt-Peterburgskogo universiteta. Prikladna\^a matematika, informatika, processy upravleni\^a},
     pages = {337--352},
     year = {2019},
     volume = {15},
     number = {3},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/VSPUI_2019_15_3_a3/}
}
TY  - JOUR
AU  - A. M. Nwohiri
AU  - A. A. Pechnikov
TI  - Application of webometrics methods for analysis and enhancement of academic site structure based on page value criterion
JO  - Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
PY  - 2019
SP  - 337
EP  - 352
VL  - 15
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/VSPUI_2019_15_3_a3/
LA  - en
ID  - VSPUI_2019_15_3_a3
ER  - 
%0 Journal Article
%A A. M. Nwohiri
%A A. A. Pechnikov
%T Application of webometrics methods for analysis and enhancement of academic site structure based on page value criterion
%J Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
%D 2019
%P 337-352
%V 15
%N 3
%U http://geodesic.mathdoc.fr/item/VSPUI_2019_15_3_a3/
%G en
%F VSPUI_2019_15_3_a3
A. M. Nwohiri; A. A. Pechnikov. Application of webometrics methods for analysis and enhancement of academic site structure based on page value criterion. Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 15 (2019) no. 3, pp. 337-352. http://geodesic.mathdoc.fr/item/VSPUI_2019_15_3_a3/

[1] Björneborn L., Ingwersen P., “Toward a basic framework for webometrics”, Journal of the American Society for Information Science and Technology, 55:14 (2004), 1216–1227 | DOI

[2] Thelwall M., “A history of webometrics”, Bulletin of the American Society for Information Science and Technology, 38:6 (2012), 18–23 | DOI

[3] Broder A., Kumar R., Maghoull F., Raghavan P., Rajagopalan S., Stata R., Tomkins A., Wiener J., “Graph structure in the web”, Journal of Computer Networks, 33:1–6 (2000), 309–320 | DOI | MR

[4] Ortega J. L., Aguillo I. F., “Visualization of the Nordic Academic Web: Link analysis using social network tools”, Information Processing $\$ Management, 44:4 (2008), 1624–1633 | DOI

[5] Thelwall M., Harries G., “The connection between the research of a university counts of links to its web pages: An investigation based upon a classification of the relationships of pages to the research of the host university”, Journal of the American Society for Information Science and Technology, 54:7 (2003), 593–699 | DOI

[6] Babak A., Babak S., “Graph theory application and web page ranking for website link structure improvement”, Behavior $\$ Information Technology, 28:1 (2009), 63–72 | DOI

[7] Liu Y., Ma Z. M., Zhou C., “Web Markov skeleton processes and their applications”, Tohoku Mathematical Journal, 63 (2011), 665–695 | DOI | MR | Zbl

[8] Shokin Y. I., Klimenko O. A., Rychkova E. V., Shabalnikov I. V., “Website rating for scientific and research organizations of the Siberian Branch of Russian Academy of Sciences”, Computational Technologies, 13:3 (2008), 128–135

[9] Dehmer M., Dobrynin A. A., Konstantinova E. V., Vesnin A. Y., Klimenko O. A, Shokin Y. I., Rychkova E. V., Medvedev A. N., “Analysis of webspaces of the Siberian Branch of the Russian Academy of Sciences and the Fraunhofer—Gesellschaft”, Information Technology in Industry, 6:1 (2018), 1–6

[10] Meusel R., Vigna S., Lehmberg O., Bizer C., “The graph structure in the web — analyzed on different aggregation levels”, The Journal of Web Science, 1 (2015), 33–47 | DOI

[11] Pant G., Srinivasan P., Menczer F., “Crawling the web”, Web dynamics, eds. M. Levene, A. Poulovassilis, Springer Publ., Berlin, 2004, 153–178 | DOI

[12] Yadav M., Goyal N., “Comparison of open source crawlers — a review”, International Journal of Scientific $\$ Engineering Research, 6:9 (2015), 1544–1551

[13] Pechnikov A. A., Lankin A. V., “Development of a program for collecting data on the structure of websites”, Proceedings of the Karelian Research Center of the Russian Academy of Sciences, 8 (2016), 81–90 | DOI

[14] Bar-Yossef Z., Keidar I., Schonfeld U., “Do not crawl in the DUST: Different URLs with similar text”, Proceedings of the 16th International World-Wide Web Conference, 2007, 111–120 | DOI

[15] Cormen T. H., Leiserson C. E., Rivest R. L., Stein C., “Breadth-first search”, Introduction to Algorithms, 2nd ed., MIT Press and McGraw-Hill Publ., Cambridge, Massachusetts, 2001, 531–539 | MR

[16] The Open Graph Viz Platform, (accessed: 10.01.2019) https://gephi.org

[17] Brin S., Page L., “The anatomy of a Large-Scale Hypertextual Web Search Engine”, Computer Networks and ISDN Systems, 30:1–7 (1998), 107–117 | DOI

[18] Pandurangan G., Raghavan P., Upfal E., “Using PageRank to characterize web structure”, Proceedings of the 8th Annual International Conference: Computing and Combinatorics, Lecture Notes in Computer Science, 2387, 2000, 330–339 | DOI | MR

[19] Pechnikov A. A., Web resources of the Russian university: Self-organization or administrative impact?, Electronic Libraries, 18:6 (2015), 277–295

[20] Baeza-Yates R., Castillo C., “Crawling the infinite web: Five levels are enough”, Algorithms and Models for the WebGraph, Proceedings of the Third Workshop on Web Graphs (WAW), Lecture Notes in Computer Science, 3243, 2004, 156–167 | DOI | Zbl

[21] Pechnikov A. A., “On evaluating the value of the pages of a website”, Informatization of Education and Science, 2015, no. 4(28), 28–40