Automation of morphological tagging of archival documents
Matematičeskaâ fizika i kompʹûternoe modelirovanie, Tome 22 (2019) no. 4, pp. 53-63.

Voir la notice de l'article provenant de la source Math-Net.Ru

The paper provides the description of the add-on to MyStem stemming tool by I. Segalovich. We designe the application to add to MyStem a convenient graphical interface that is easy to learn and intuitive for users who do not specialize in information technology. It turned out that MyStem correctly processes outdated vocabulary if it is passed into the program using modern Cyrillic. In addition to the convenient interface, our program has the option to work with the outdated Cyrillic alphabet, when for instance, the letters zelo and omega are replaced by “ks” and “o” respectively, and only then the text is transferred for analysis to MyStem, and then the characters are replaced back in the processed document. So our add-on intercepts the output of MyStem tool, reformats and analyzes it in a special way. In addition, the application has functionality for removing homonyms manually if the program was not correct with automatic tagging of morphological characteristics of a word. The main purpose of this application is to prepare morphological tagging of documents of the archival fund “Mikhailovsky Stanichny Ataman” to create a linguistic corpus. During the work on the application, we solved the problem with correct processing of texts containing outdated Cyrillic characters. To implement a functional and user-friendly graphical interface, we use JavaFX platform (OpenJFX).
Keywords: automation of linguistic analysis, automation of morphological analysis, MyStem tool, graphical interface, software shell, corpus-based linguistics.
@article{VVGUM_2019_22_4_a3,
     author = {A. S. Komendantov and A. G. Matveev and A. V. Svetlov},
     title = {Automation of morphological tagging of archival documents},
     journal = {Matemati\v{c}eska\^a fizika i kompʹ\^uternoe modelirovanie},
     pages = {53--63},
     publisher = {mathdoc},
     volume = {22},
     number = {4},
     year = {2019},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/VVGUM_2019_22_4_a3/}
}
TY  - JOUR
AU  - A. S. Komendantov
AU  - A. G. Matveev
AU  - A. V. Svetlov
TI  - Automation of morphological tagging of archival documents
JO  - Matematičeskaâ fizika i kompʹûternoe modelirovanie
PY  - 2019
SP  - 53
EP  - 63
VL  - 22
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/VVGUM_2019_22_4_a3/
LA  - ru
ID  - VVGUM_2019_22_4_a3
ER  - 
%0 Journal Article
%A A. S. Komendantov
%A A. G. Matveev
%A A. V. Svetlov
%T Automation of morphological tagging of archival documents
%J Matematičeskaâ fizika i kompʹûternoe modelirovanie
%D 2019
%P 53-63
%V 22
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/item/VVGUM_2019_22_4_a3/
%G ru
%F VVGUM_2019_22_4_a3
A. S. Komendantov; A. G. Matveev; A. V. Svetlov. Automation of morphological tagging of archival documents. Matematičeskaâ fizika i kompʹûternoe modelirovanie, Tome 22 (2019) no. 4, pp. 53-63. http://geodesic.mathdoc.fr/item/VVGUM_2019_22_4_a3/