A method for creating structural models of text documents using neural networks
Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 12 (2023) no. 1, pp. 28-45
Voir la notice de l'article provenant de la source Math-Net.Ru
The article describes modern neural network BERT-based models and considers their application for Natural Language Processing tasks such as question answering and named entity recognition. The article presents a method for solving the problem of automatically creating structural models of text documents. The proposed method is hybrid and is based on jointly utilizing several NLP models. The method builds a structural model of a document by extracting sentences that correspond to various aspects of the document. Information extraction is performed by using the BERT Question Answering model with questions that are prepared separately for each aspect. The answers are filtered via the BERT Named Entity Recognition model and used to generate the contents of each field of the structural model. The article proposes two algorithms for field content generation: Exclusive answer choosing algorithm and Generalizing answer forming algorithm, that are used for short and voluminous fields respectively. The article also describes the software implementation of the proposed method and discusses the results of experiments conducted to evaluate the quality of the method.
Keywords:
neural network, named entity recognition, question-answering system.
Mots-clés : information extraction
Mots-clés : information extraction
@article{VYURV_2023_12_1_a1,
author = {D. V. Berezkin and I. A. Kozlov and P. A. Martynyuk and A. M. Panfilkin},
title = {A method for creating structural models of text documents using neural networks},
journal = {Vestnik \^U\v{z}no-Uralʹskogo gosudarstvennogo universiteta. Seri\^a Vy\v{c}islitelʹna\^a matematika i informatika},
pages = {28--45},
publisher = {mathdoc},
volume = {12},
number = {1},
year = {2023},
language = {en},
url = {http://geodesic.mathdoc.fr/item/VYURV_2023_12_1_a1/}
}
TY - JOUR AU - D. V. Berezkin AU - I. A. Kozlov AU - P. A. Martynyuk AU - A. M. Panfilkin TI - A method for creating structural models of text documents using neural networks JO - Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika PY - 2023 SP - 28 EP - 45 VL - 12 IS - 1 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/VYURV_2023_12_1_a1/ LA - en ID - VYURV_2023_12_1_a1 ER -
%0 Journal Article %A D. V. Berezkin %A I. A. Kozlov %A P. A. Martynyuk %A A. M. Panfilkin %T A method for creating structural models of text documents using neural networks %J Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika %D 2023 %P 28-45 %V 12 %N 1 %I mathdoc %U http://geodesic.mathdoc.fr/item/VYURV_2023_12_1_a1/ %G en %F VYURV_2023_12_1_a1
D. V. Berezkin; I. A. Kozlov; P. A. Martynyuk; A. M. Panfilkin. A method for creating structural models of text documents using neural networks. Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 12 (2023) no. 1, pp. 28-45. http://geodesic.mathdoc.fr/item/VYURV_2023_12_1_a1/