Methodological aspects of semantic relationship extraction for automatic thesaurus generation

N. S. Lagutina; K. V. Lagutina; E. I. Mamedov; I. V. Paramonov

N. S. Lagutina ; K. V. Lagutina ; E. I. Mamedov ; I. V. Paramonov

Modelirovanie i analiz informacionnyh sistem, Tome 23 (2016) no. 6, pp. 826-840 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

Résumé

The paper is devoted to analysis of methods for automatic generation of a specialized thesaurus. The main algorithm of generation consists of three stages: selection and preprocessing of a text corpus, recognition of thesaurus terms, and extraction of relations among terms. Our work is focused on exploring methods for semantic relation extraction. We developed a test bench that allow to test well-known algorithms for extraction of synonyms and hypernyms. These algorithms are based on different relation extraction techniques: lexico-syntactic patterns, morpho-syntactic rules, measurement of term information quantity, general-purpose thesaurus WordNet, and Levenstein distance. For analysis of the result thesaurus we proposed a complex assessment that includes the following metrics: precision of extracted terms, precision and recall of hierarchical and synonym relations, and characteristics of the thesaurus graph (the number of extracted terms and semantic relationships of different types, the number of connected components, and the number of vertices in the largest component). The proposed set of metrics allows to evaluate the quality of the thesaurus as a whole, reveal some drawbacks of standard relation extraction methods, and create more efficient hybrid methods that can generate thesauri with better characteristics than thesauri generated by using separate methods. In order to illustrate this fact, one of such hybrid methods is considered in the paper. It combines the best standard algorithms for hypernym and synonym extraction and generates a specialized medical thesaurus. The hybrid method leaves the thesaurus quality on the same level and finds more relations between terms than well-known algorithms.

Keywords: thesaurus, semantic relations, hybrid method, complex assessment, test bench.

@article{MAIS_2016_23_6_a11,
     author = {N. S. Lagutina and K. V. Lagutina and E. I. Mamedov and I. V. Paramonov},
     title = {Methodological aspects of semantic relationship extraction for automatic thesaurus generation},
     journal = {Modelirovanie i analiz informacionnyh sistem},
     pages = {826--840},
     year = {2016},
     volume = {23},
     number = {6},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MAIS_2016_23_6_a11/}
}

TY  - JOUR
AU  - N. S. Lagutina
AU  - K. V. Lagutina
AU  - E. I. Mamedov
AU  - I. V. Paramonov
TI  - Methodological aspects of semantic relationship extraction for automatic thesaurus generation
JO  - Modelirovanie i analiz informacionnyh sistem
PY  - 2016
SP  - 826
EP  - 840
VL  - 23
IS  - 6
UR  - http://geodesic.mathdoc.fr/item/MAIS_2016_23_6_a11/
LA  - ru
ID  - MAIS_2016_23_6_a11
ER  -

%0 Journal Article
%A N. S. Lagutina
%A K. V. Lagutina
%A E. I. Mamedov
%A I. V. Paramonov
%T Methodological aspects of semantic relationship extraction for automatic thesaurus generation
%J Modelirovanie i analiz informacionnyh sistem
%D 2016
%P 826-840
%V 23
%N 6
%U http://geodesic.mathdoc.fr/item/MAIS_2016_23_6_a11/
%G ru
%F MAIS_2016_23_6_a11

N. S. Lagutina; K. V. Lagutina; E. I. Mamedov; I. V. Paramonov. Methodological aspects of semantic relationship extraction for automatic thesaurus generation. Modelirovanie i analiz informacionnyh sistem, Tome 23 (2016) no. 6, pp. 826-840. http://geodesic.mathdoc.fr/item/MAIS_2016_23_6_a11/

Bibliographie
Cité par

[1] Aitchison J., Gilchrist A., Bawden D., Thesaurus construction and use: a practical manual, Psychology Press, 2000, 230 pp.

[2] Loukachevitch N. V., Dobrov B. V., “Developing Linguistic Ontologies in Broad Domains”, Ontology of Designing, 5:1(15) (2015), 47–69 (in Russian) | MR

[3] Lukashevich N. V., Tezaurusy v zadachah informacionnogo poiska, Izdatelstvo MGU, M., 2011, 512 pp. (in Russian)

[4] Astrakhantsev N. A., Turdakov D. Yu., “Automatic construction and enrichment of informal ontologies: A survey”, Programming and computer software, 39:1 (2013), 34–42 | DOI | MR

[5] Hasan K. S., Ng V., “Automatic Keyphrase Extraction: A Survey of the State of the Art”, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2014, 1262–1273

[6] Paramonov I. et al., “Thesaurus-Based Method of Increasing Text-via-Keyphrase Graph Connectivity During Keyphrase Extraction for e-Tourism Applications”, International Conference on Knowledge Engineering and the Semantic Web, Springer, 2016, 129–141 | DOI

[7] Yang D., Powers D. M., “Automatic thesaurus construction”, Proceedings of the thirty-first Australasian conference on Computer science, Conferences in Research and Practice in Information Technology, 74, Australian Computer Society, Inc., 2008, 147–156

[8] Mihalcea R., Tarau P., “TextRank: Bringing order into texts”, Proceedings of EMNLP, Association for Computational Linguistics, 2004, 404–411

[9] Liu Z. et al., “Automatic keyphrase extraction via topic decomposition”, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2010, 366–376

[10] Wiemer-Hastings P., Wiemer-Hastings K., Graesser A., “Latent semantic analysis”, Proceedings of the 16th international joint conference on Artificial intelligence, Citeseer, 2004, 1–14

[11] Lefever E., Van de Kauter M., Hoste V., “Evaluation of automatic hypernym extraction from technical corpora in English and Dutch”, 9th International Conference on Language Resources and Evaluation (LREC), European Language Resources Association (ELRA), 2014, 490–497

[12] Oakes M. P., “Using Hearst's Rules for the Automatic Acquisition of Hyponyms for Mining a Pharmaceutical Corpus”, RANLP Text Mining Workshop, v. 5, 2005, 63–67

[13] Noh S., Kim S., Jung C., “A Lightweight Program Similarity Detection Model using XML and Levenshtein Distance”, FECS, Citeseer, 2006, 3–9

[14] Mozzherina E. S., Ehlektronnye biblioteki: Perspektivnye Metody i Tekhnologii, Ehlektronnye kollekcii, RCDL 2011, 293–298 (in Russian)

[15] Mittelu V. B., “Automatic Extraction of Patterns Displaying Hyponym-Hypernym Co-Occurrence from Corpora”, Proceedings of First Central European Student Conference in Linguistics, Citeseer, 2006, 21, 8 pp.

Parcourir par

Geodesic

Parcourir par