The construction and analysis of the Russian language models for a cryptographic algorithm research
Čebyševskij sbornik, Tome 23 (2022) no. 2, pp. 151-160.

Voir la notice de l'article provenant de la source Math-Net.Ru

The article provides a statistical analysis of the properties of lexical and $n$-gram models of the Russian language based on the news text corpus. A specialized corpus of political news articles of recent years has been created, reflecting a narrow area of language use. The token and $n$-gram dictionaries are compiled, the coverage values are found, as well as the values of entropy. Lemmatization of the original text corpus and extrapolation of the dictionary volumes are performed.
Keywords: $n$-gram dictionaries, $n$-gram entropy, meaningful texts.
@article{CHEB_2022_23_2_a8,
     author = {A. G. Malashina and A. B. Los},
     title = {The construction and analysis of the {Russian} language models for a cryptographic algorithm research},
     journal = {\v{C}eby\v{s}evskij sbornik},
     pages = {151--160},
     publisher = {mathdoc},
     volume = {23},
     number = {2},
     year = {2022},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/CHEB_2022_23_2_a8/}
}
TY  - JOUR
AU  - A. G. Malashina
AU  - A. B. Los
TI  - The construction and analysis of the Russian language models for a cryptographic algorithm research
JO  - Čebyševskij sbornik
PY  - 2022
SP  - 151
EP  - 160
VL  - 23
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CHEB_2022_23_2_a8/
LA  - ru
ID  - CHEB_2022_23_2_a8
ER  - 
%0 Journal Article
%A A. G. Malashina
%A A. B. Los
%T The construction and analysis of the Russian language models for a cryptographic algorithm research
%J Čebyševskij sbornik
%D 2022
%P 151-160
%V 23
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CHEB_2022_23_2_a8/
%G ru
%F CHEB_2022_23_2_a8
A. G. Malashina; A. B. Los. The construction and analysis of the Russian language models for a cryptographic algorithm research. Čebyševskij sbornik, Tome 23 (2022) no. 2, pp. 151-160. http://geodesic.mathdoc.fr/item/CHEB_2022_23_2_a8/

[1] Alferov A. P., Zubov A. Yu., Kuzmin A. S., Cheremushkin A. V., Osnovy kriptografii, uchebnoe posobie, 3-e izd., ispr. i dop., Gelios ARV, M., 2005, 408 pp.

[2] Babash A. V., Shankin G. P., Kriptografiya, SOLON-PRESS, M., 2007

[3] Viktorov A. B., Gramnitskii S. G., Gordeev S. S., Eskevich M. V., Klimina E. M., “Universalnaya metodika podgotovki komponentov obucheniya sistem raspoznavaniya rechi”, Rechevye tekhnologii, 2009, Fevral, 39–56

[4] Volosatova T. M., Chichvarin N. V., Informatika i lingvistika, ucheb. posobie, INFRA-M, 2018, 196 pp.

[5] Kipyatkova I. S., “Issledovanie statisticheskikh n-grammnykh modelei yazyka dlya raspoznavaniya slitnoi russkoi rechi so sverkhbolshim slovarem”, Analiz razgovornoi russkoi rechi, Sankt-Peterburg, 2010

[6] Malashina A. G., “Algoritm vosstanovleniya otdelnykh chastei soobscheniya po informatsii o vozmozhnykh znacheniyakh ego znakov”, Mezhvuzovskaya nauchno-tekhnicheskaya konferentsiya studentov, aspirantov i molodykh spetsialistov imeni E.V. Armenskogo, Materialy konferentsii (Moskva, 2019), 215–217

[7] Shennon K., Raboty po teorii informatsii i kibernetike, Izdatelstvo inostrannoi literatury, M., 1963

[8] Yaglom A. M., Yaglom I. M., Veroyatnost i informatsiya, 3-e izd., ispr. i dop., izdatelstvo «Nauka», M., 1973, 236–290 | MR

[9] Bellegarda J. R., “Robustness in Statistical Language Modeling”, Robustness in Language and Speech Technology, Springer Science+Business Media Dordrecht, 2001, 104–106

[10] Chase L., Rosenfeld R., Ward W., “Error-responsive modifications to speech recognizers: negative n-grams”, Third International Conference on Spoken Language Processing (Yokohama, 1994)

[11] Florencio D., Herley C., “A Large-Scale Study of Web Password Habits”, Proceeds of the International World Wide Web Conference Committee, 2015

[12] Gelbukh A., Sidorov G., “Zipf and Heaps Laws' Coefficients Depend on Language”, Conference on Intelligent Text Processing and Computational Linguistics (Mexico City, 2001) | MR

[13] Kechedzhy K. E., “Rank distributions of words in additive many-step Markov chaons and the Zipf law”, Phys. Rev. E, 72 (2005)

[14] Massey J., “Guessing and entropy”, Proceedings of 1994 IEEE International Symposium on Information Theory, IEEE, 1994, 204 | DOI

[15] Rosenfeld R., “Optimizing lexical and n-gram coverage via judicious use of linguistic data”, Proceedings of the Fourth European Conference on Speech Communication and Technology (Madrid, 1995)