The construction and analysis of the Russian language models for a cryptographic algorithm research
Čebyševskij sbornik, Tome 23 (2022) no. 2, pp. 151-160
Voir la notice de l'article provenant de la source Math-Net.Ru
The article provides a statistical analysis of the properties of lexical and $n$-gram models of the Russian language based on the news text corpus. A specialized corpus of political news articles of recent years has been created, reflecting a narrow area of language use. The token and $n$-gram dictionaries are compiled, the coverage values are found, as well as the values of entropy. Lemmatization of the original text corpus and extrapolation of the dictionary volumes are performed.
Keywords:
$n$-gram dictionaries, $n$-gram entropy, meaningful texts.
@article{CHEB_2022_23_2_a8,
author = {A. G. Malashina and A. B. Los},
title = {The construction and analysis of the {Russian} language models for a cryptographic algorithm research},
journal = {\v{C}eby\v{s}evskij sbornik},
pages = {151--160},
publisher = {mathdoc},
volume = {23},
number = {2},
year = {2022},
language = {ru},
url = {http://geodesic.mathdoc.fr/item/CHEB_2022_23_2_a8/}
}
TY - JOUR AU - A. G. Malashina AU - A. B. Los TI - The construction and analysis of the Russian language models for a cryptographic algorithm research JO - Čebyševskij sbornik PY - 2022 SP - 151 EP - 160 VL - 23 IS - 2 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/CHEB_2022_23_2_a8/ LA - ru ID - CHEB_2022_23_2_a8 ER -
A. G. Malashina; A. B. Los. The construction and analysis of the Russian language models for a cryptographic algorithm research. Čebyševskij sbornik, Tome 23 (2022) no. 2, pp. 151-160. http://geodesic.mathdoc.fr/item/CHEB_2022_23_2_a8/