Vector space model of knowledge representation based on semantic relatedness
Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 6 (2017) no. 3, pp. 73-83 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

Most of text mining algorithms uses vector space model of knowledge representation. Vector space model uses the frequency (weight) of term to determine its importance in the document. Terms can be semantically similarbut different lexicographically, which in turn will lead to the fact that the classification is based on the frequencyof the terms does not give the desired result. Analysis of a low-quality results shows that errors occur due to the characteristics of natural language, which were not taken into account. Neglect of these features, namely, synonymy and polysemy, increases the dimension ofsemantic space, which determines the performance of the final software product developed based on the algorithm.Furthermore, the results of many complex algorithms perceived domain expert to prepare training sample, whichin turn also affects quality issue algorithm. We propose a model that in addition to the weight of a term in a document also uses semantic weight of the term. Semantic weight terms, the higher they are semantically closer to each other. To calculate the semantic similarity of terms we propose to use a adaptation of the extended Lesk algorithm. The method of calculating semantic similarity lies in the fact that for each value of the word in question is countedas the number of words referred to the dictionary definition of this value (assuming that the dictionary definitiondescribes several meanings of the word), and in the immediate context of the word in question. As the mostprobable meaning of the word is selected such that this intersection was more. Vector model based on semanticproximity of terms solves the problem of the ambiguity of synonyms.
Keywords: text-mining, vector space model, semantic relatedness.
@article{VYURV_2017_6_3_a4,
     author = {D. V. Bondarchuk},
     title = {Vector space model of knowledge representation based on semantic relatedness},
     journal = {Vestnik \^U\v{z}no-Uralʹskogo gosudarstvennogo universiteta. Seri\^a Vy\v{c}islitelʹna\^a matematika i informatika},
     pages = {73--83},
     year = {2017},
     volume = {6},
     number = {3},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/VYURV_2017_6_3_a4/}
}
TY  - JOUR
AU  - D. V. Bondarchuk
TI  - Vector space model of knowledge representation based on semantic relatedness
JO  - Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika
PY  - 2017
SP  - 73
EP  - 83
VL  - 6
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/VYURV_2017_6_3_a4/
LA  - ru
ID  - VYURV_2017_6_3_a4
ER  - 
%0 Journal Article
%A D. V. Bondarchuk
%T Vector space model of knowledge representation based on semantic relatedness
%J Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika
%D 2017
%P 73-83
%V 6
%N 3
%U http://geodesic.mathdoc.fr/item/VYURV_2017_6_3_a4/
%G ru
%F VYURV_2017_6_3_a4
D. V. Bondarchuk. Vector space model of knowledge representation based on semantic relatedness. Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 6 (2017) no. 3, pp. 73-83. http://geodesic.mathdoc.fr/item/VYURV_2017_6_3_a4/

[1] A. Budanitsky, G. Hirst, “Evaluating WordNet-based Measures of Lexical Semantic Relatedness”, Computational Linguistics, 32 (2006), 13–47

[2] A. Hotho, S. Staab, G. Stumme, “WordNet Improve Text Document Clustering”, SIGIR 2003 Semantic Web Workshop (Toronto, Canada, July 28 – August 1, 2003), 2003, 541–544 | DOI

[3] J. Sedding, K. Dimitar, “WordNet-based Text Document Clustering”, COLING 2004, 3rd Workshop on Robust Methods in Analysis of Natural Language Data (Geneva, Switzerland, August 23 – 27, 2004), 2004, 104–113 | DOI

[4] M. Lesk, “Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone”, SIGDOC’86. Proceedings of the 5th Annual International Conference on Systems Documentation (Toronto, Canada, June 8 – 11, 1986), 1986, 24–26 | DOI

[5] C. Loupy, M. El-Beze, P. F. Marteau, “Word Sense Disambiguation Using HMM Tagger”, Proceedings of the 1st International Conference on Language Resources and Evaluation, LREC (Toronto, Canada, June 8 – 11, 1998)), 1998, 1255–1258 | DOI

[6] G. Jeh, J. Widom, “SimRank: a Measure of Structural-Context Similarity”, Proceedings of the 8th Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining international conference on Knowledge discovery and data mining (Edmonton, Canada, Jule 23 – 25, 2002), 2002, 271–279 | DOI

[7] K. E. Kechedzhy, O. Usatenko, V. A. Yampolskii, “Rank Distributions of Words in Additive Many-step Markov Chains and the Zipf law”, Physical Reviews E: Statistical, Nonlinear, Biological, and Soft Matter Physics, 72 (2005), 381–386

[8] R. Mihalcea, “Using Wikipedia for Automatic Word Sense Disambiguation”, Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (New York, USA, April 22 – 27, 2007), 2007, 196–203 | DOI

[9] P. Willett, “The Porter Stemming Algorithm: Then and Now”, Program: Electronic Library and Information Systems, 4:3 (2006), 219–223

[10] D. V. Bondarchuk, “Choosing the Best Method of Data Mining for the Selection of Vacancies”, Information Technology Modeling and Management, 2013, no. 6, 504–513

[11] G. Salton, “Improving Retrieval Performance by Relevance Feedback”, Readings in Information Retrieval, 24 (1997), 1–5

[12] P. Tan, “N., Steinbach M., Kumar V. Top 10 Algorithms in Data Mining”, Knowledge and Information Systems, 14:1 (2008), 1–37 | DOI

[13] S. Banerjee, T. Pedersen, “An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet”, Lecture Notes In Computer Science (Canberra, Australia, February 11 – 22, 2002), v. 2276, 2002, 136–145 | DOI

[14] Thesaurus WordNET, } {\tt https://wordnet.princeton.edu/

[15] D. V. Bondarchuk, “Intelligent Method of Selection of Personal Recommendations, Guarantees a Non-empty Result”, Information Technology Modeling and Management, 2015, no. 2, 130–138