A method for creating structural models of text documents using neural networks
Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 12 (2023) no. 1, pp. 28-45

Voir la notice de l'article provenant de la source Math-Net.Ru

The article describes modern neural network BERT-based models and considers their application for Natural Language Processing tasks such as question answering and named entity recognition. The article presents a method for solving the problem of automatically creating structural models of text documents. The proposed method is hybrid and is based on jointly utilizing several NLP models. The method builds a structural model of a document by extracting sentences that correspond to various aspects of the document. Information extraction is performed by using the BERT Question Answering model with questions that are prepared separately for each aspect. The answers are filtered via the BERT Named Entity Recognition model and used to generate the contents of each field of the structural model. The article proposes two algorithms for field content generation: Exclusive answer choosing algorithm and Generalizing answer forming algorithm, that are used for short and voluminous fields respectively. The article also describes the software implementation of the proposed method and discusses the results of experiments conducted to evaluate the quality of the method.
Keywords: neural network, named entity recognition, question-answering system.
Mots-clés : information extraction
@article{VYURV_2023_12_1_a1,
     author = {D. V. Berezkin and I. A. Kozlov and P. A. Martynyuk and A. M. Panfilkin},
     title = {A method for creating structural models of text documents using neural networks},
     journal = {Vestnik \^U\v{z}no-Uralʹskogo gosudarstvennogo universiteta. Seri\^a Vy\v{c}islitelʹna\^a matematika i informatika},
     pages = {28--45},
     publisher = {mathdoc},
     volume = {12},
     number = {1},
     year = {2023},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/VYURV_2023_12_1_a1/}
}
TY  - JOUR
AU  - D. V. Berezkin
AU  - I. A. Kozlov
AU  - P. A. Martynyuk
AU  - A. M. Panfilkin
TI  - A method for creating structural models of text documents using neural networks
JO  - Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika
PY  - 2023
SP  - 28
EP  - 45
VL  - 12
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/VYURV_2023_12_1_a1/
LA  - en
ID  - VYURV_2023_12_1_a1
ER  - 
%0 Journal Article
%A D. V. Berezkin
%A I. A. Kozlov
%A P. A. Martynyuk
%A A. M. Panfilkin
%T A method for creating structural models of text documents using neural networks
%J Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika
%D 2023
%P 28-45
%V 12
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/VYURV_2023_12_1_a1/
%G en
%F VYURV_2023_12_1_a1
D. V. Berezkin; I. A. Kozlov; P. A. Martynyuk; A. M. Panfilkin. A method for creating structural models of text documents using neural networks. Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 12 (2023) no. 1, pp. 28-45. http://geodesic.mathdoc.fr/item/VYURV_2023_12_1_a1/