Methods of speech and text databases development for QA-systems
Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematika, mehanika, fizika, Tome 10 (2018) no. 3, pp. 59-66 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

The paper is devoted to the problems of question-answer systems development (QA-systems). The subject of the study is discussion of approaches to the automatic filling of the database of the QA-system based on the analysis of the unstructured text sources currently available in the public domain of the Internet. The analysis reveals that the following ways of implementing QA-systems are distinguished: based on inference for ontologies, rules and syntax, using artificial neural networks. The methods for automatically search of question-answer pairs based on the structure of sentences and on the basis of associative-ontological analysis has been developed and tested in the research. The method based on the analysis of the structure of sentences is effective for texts such as lists of frequently asked questions (FAQ), as well as literature texts containing dialogs, direct speech, based on preliminary processing of the text, expressed in the form of a heuristic rule. The method based on associative-ontological analysis is focused to the class of reference and dictionary texts and is based on the assumption that in the descriptive text there is a sentence (or a group of sentences) containing the main idea of the text. In this case, the title of the text can be considered a question, and this sentence (or a group of sentences) is the answer. We need to make the selection of meaning-generating sentences due to the semantic reduction of the text automation. For this purpose, algorithms of self-referencing are applied based on the associative-ontological approach to the processing of texts in natural language. For the experimental verification of the possibility of creating an open QA-system based on the automatic collection of question-answer pairs from the Internet, a prototype of a collection module for the database of the QA-system has been developed.
Keywords: question-answer pair, associative-ontological analysis, text, automatic text processing, natural language, speech recognition.
@article{VYURM_2018_10_3_a6,
     author = {A. L. Ronzhin and A. A. Zaytseva and S. V. Kuleshov and K. V. Nenausnikov},
     title = {Methods of speech and text databases development for {QA-systems}},
     journal = {Vestnik \^U\v{z}no-Uralʹskogo gosudarstvennogo universiteta. Seri\^a, Matematika, mehanika, fizika},
     pages = {59--66},
     year = {2018},
     volume = {10},
     number = {3},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/VYURM_2018_10_3_a6/}
}
TY  - JOUR
AU  - A. L. Ronzhin
AU  - A. A. Zaytseva
AU  - S. V. Kuleshov
AU  - K. V. Nenausnikov
TI  - Methods of speech and text databases development for QA-systems
JO  - Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematika, mehanika, fizika
PY  - 2018
SP  - 59
EP  - 66
VL  - 10
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/VYURM_2018_10_3_a6/
LA  - en
ID  - VYURM_2018_10_3_a6
ER  - 
%0 Journal Article
%A A. L. Ronzhin
%A A. A. Zaytseva
%A S. V. Kuleshov
%A K. V. Nenausnikov
%T Methods of speech and text databases development for QA-systems
%J Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematika, mehanika, fizika
%D 2018
%P 59-66
%V 10
%N 3
%U http://geodesic.mathdoc.fr/item/VYURM_2018_10_3_a6/
%G en
%F VYURM_2018_10_3_a6
A. L. Ronzhin; A. A. Zaytseva; S. V. Kuleshov; K. V. Nenausnikov. Methods of speech and text databases development for QA-systems. Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematika, mehanika, fizika, Tome 10 (2018) no. 3, pp. 59-66. http://geodesic.mathdoc.fr/item/VYURM_2018_10_3_a6/

[1] Kipyatkova I. S., Karpov A. A., “Automatic Russian Speech Recognition Using Factored Language Models”, Artificial Intelligence and Decision Making, 2015, no. 3, 62–69 (in Russ.)

[2] Bogomolov A. V., Kukushkin Yu. A., “Personalized monitoring automation of the labor conditions”, Automation. Modern technologies, 2015, no. 3, 6–8 (in Russ.)

[3] Zinkin V. N., Soldatov S. K., Kukushkin Yu. A., Afanasyev R. V., Bogomolov A. V., Akhmetzyanov I. M., Svidovyi V. I., Pirozhkov M. V., “Hygienic evaluation of work conditions for noiserelated occupationsin aircraft repair plants”, Occupational Medicine and Industrial Ecology, 2008, no. 4, 40–42 (in Russ.)

[4] Goryachkina T. G., Ushakov I. B., Evdokimov V. I., Bogomolov A. V., “Methodical and Methodological Recommendations for Inventors of Innovations Aimed at Assessing the Functional State of A Human Operator”, Technologies of Living Systems, 3:3 (2006), 33–38 (in Russ.)

[5] Kukyshkin Ju. A., Bogomolov A. V., Guzij A. G., “Principles of Construction of Life Support Systems of Human Controllers of Systems “Man-Machine”, Adaptive to Their Functional State”, Mechatronics, Automation, Control, 2005, no. 3, 50–54 (in Russ.)

[6] Lapshin V. A., “Question-answer systems: development and prospects”, Scientific and technical information. Series 2. Information Processes and Systems, 2012, no. 6, 1–9 (in Russ.)

[7] A. Rodrigo, A.A. Peñas, “A study about the future evaluation of Question-Answering systems”, Knowledge-Based Systems, 137 (2017), 83–93 | DOI

[8] L. Zou, R. Huang, H. Wang et al., “Natural Language Question Answering over RDF: A Graph Data Driven Approach”, Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD'14 (Snowbird, Utah, USA, June 22–27, 2014), 313–324 | DOI

[9] A. Fader, L. Zettlemoyer, O. Etzioni, “Open Question Answering over the Curated and Extracted Knowledge Bases”, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '14 (New York, New York, USA, August 24–27, 2014), 1156–1165 | DOI

[10] J. Li, H. Liu, Y. Zhang, C. Xing, “A Health QA with Enhanced User Interfaces”, Proceedings of the 13th Web Information Systems and Applications Conference, WISA (September 23–25, 2016), 173–178 | DOI

[11] Y. Liu, J. Bian, E. Agichtein, “Predicting Information Seeker Satisfaction in Community Question Answering”, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '08, 2008, 483–490 | DOI

[12] Sutyagin I. V., Young Scientist, 2012, no. 1-1, 151–153 (in Russ.) | Zbl

[13] Fedorkova G. S., “Crowdsourcing technologies in the Russian social media”, Proc. All-Russian Scientific and Practical Conference “Communication in the Modern World” (Voronezh, May 11–13, 2017), 154–155 (in Russ.)

[14] \href{https://www.wolframalpha.com/} https://www.wolframalpha.com/ (Date of access: 27.12.2017)

[15] Nikitin A., Raykov P., Question-answer systems (in Russ.)

[16] Kuleshov S. V., Zaytseva A. A., Markov V. S., “Associative-Ontological Approach to Natural Language Texts Processing”, Intellectual Technologies on Transport, 2015, no. 4, 40–43 (in Russ.)

[17] Pervushin A., “Module of graphematic analysis in the system for processing Russian-language texts”, Novye informatsionnye tekhnologii v avtomatizirovannykh sistemakh, 2012, no. 15, 187–190 (in Russ.)

[18] Alexandrov V. V., Kuleshov S. V., “Analytical Monitoring of Internet Content. Info Logical Approach”, Quality. Innovation. Education, 2008, no. 3(34), 68–70 (in Russ.)

[19] Mikhailov S. N., Kuleshov S. N., “Expert monitoring of unstructured content in the interest of information and analytical support of space researches”, Proceedings of the Southwest State University, 2013, no. 6-2 (51), 40–43 (in Russ.)