Extracting named entities from russian-language documents with different expressiveness of structure

M. D. Averina; O. A. Levanova

Geodesic

Parcourir par

Extracting named entities from russian-language documents with different expressiveness of structure

M. D. Averina ; O. A. Levanova

Modelirovanie i analiz informacionnyh sistem, Tome 30 (2023) no. 4, pp. 382-393

Voir la notice de l'article provenant de la source Math-Net.Ru

Résumé

This work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of text features and CRF parameters (optimization algorithms). In average for all entities, the best F-measure value for structured documents was 0.99, and for semi-structured ones 0.86.

Keywords: named entity extraction
Mots-clés : CRF.

@article{MAIS_2023_30_4_a5,
     author = {M. D. Averina and O. A. Levanova},
     title = {Extracting named entities from russian-language documents with different expressiveness of structure},
     journal = {Modelirovanie i analiz informacionnyh sistem},
     pages = {382--393},
     publisher = {mathdoc},
     volume = {30},
     number = {4},
     year = {2023},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MAIS_2023_30_4_a5/}
}

TY  - JOUR
AU  - M. D. Averina
AU  - O. A. Levanova
TI  - Extracting named entities from russian-language documents with different expressiveness of structure
JO  - Modelirovanie i analiz informacionnyh sistem
PY  - 2023
SP  - 382
EP  - 393
VL  - 30
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MAIS_2023_30_4_a5/
LA  - ru
ID  - MAIS_2023_30_4_a5
ER  -

%0 Journal Article
%A M. D. Averina
%A O. A. Levanova
%T Extracting named entities from russian-language documents with different expressiveness of structure
%J Modelirovanie i analiz informacionnyh sistem
%D 2023
%P 382-393
%V 30
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MAIS_2023_30_4_a5/
%G ru
%F MAIS_2023_30_4_a5

M. D. Averina; O. A. Levanova. Extracting named entities from russian-language documents with different expressiveness of structure. Modelirovanie i analiz informacionnyh sistem, Tome 30 (2023) no. 4, pp. 382-393. http://geodesic.mathdoc.fr/item/MAIS_2023_30_4_a5/