Research of features of Dostoevsky's publicistic style by using $n$-grams based on the materials of the “Time” and “Epoch” magazines
Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 17 (2021) no. 4, pp. 389-396
Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

The paper is devoted to the study of the publicity style of F. M. Dostoevsky on the basis of publications in the journals “Time” and “Epoch” (1861–1865). For this, fragments of texts (including other authors: M. M. Dostoevsky, N. N. Strakhov, A. A. Golovachev, etc.) were selected in sizes of 500, 700 and 1000 words, on which the occurrence of bigrams and trigrams (encoded sequences of parts of speech) were counted. Decision trees were built on their basis and an analysis of the accuracy of text recognition was performed. If we consider the class cation at the rest level of the tree (fragment size 1000), then the accuracy was on average 87 resulting decision trees.
Keywords: publicity style, decision tree, $n$-gram, F. M. Dostoevsky, information system “Statistical methods for analyzing literary texts”, tree matching.
Mots-clés : text attribution
@article{VSPUI_2021_17_4_a6,
     author = {R. V. Abramov and K. A. Kulakov and A. A. Lebedev and N. D. Moskin and A. A. Rogov},
     title = {Research of features of {Dostoevsky's} publicistic style by using $n$-grams based on the materials of the {{\textquotedblleft}Time{\textquotedblright}} and {{\textquotedblleft}Epoch{\textquotedblright}} magazines},
     journal = {Vestnik Sankt-Peterburgskogo universiteta. Prikladna\^a matematika, informatika, processy upravleni\^a},
     pages = {389--396},
     year = {2021},
     volume = {17},
     number = {4},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/VSPUI_2021_17_4_a6/}
}
TY  - JOUR
AU  - R. V. Abramov
AU  - K. A. Kulakov
AU  - A. A. Lebedev
AU  - N. D. Moskin
AU  - A. A. Rogov
TI  - Research of features of Dostoevsky's publicistic style by using $n$-grams based on the materials of the “Time” and “Epoch” magazines
JO  - Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
PY  - 2021
SP  - 389
EP  - 396
VL  - 17
IS  - 4
UR  - http://geodesic.mathdoc.fr/item/VSPUI_2021_17_4_a6/
LA  - en
ID  - VSPUI_2021_17_4_a6
ER  - 
%0 Journal Article
%A R. V. Abramov
%A K. A. Kulakov
%A A. A. Lebedev
%A N. D. Moskin
%A A. A. Rogov
%T Research of features of Dostoevsky's publicistic style by using $n$-grams based on the materials of the “Time” and “Epoch” magazines
%J Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
%D 2021
%P 389-396
%V 17
%N 4
%U http://geodesic.mathdoc.fr/item/VSPUI_2021_17_4_a6/
%G en
%F VSPUI_2021_17_4_a6
R. V. Abramov; K. A. Kulakov; A. A. Lebedev; N. D. Moskin; A. A. Rogov. Research of features of Dostoevsky's publicistic style by using $n$-grams based on the materials of the “Time” and “Epoch” magazines. Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 17 (2021) no. 4, pp. 389-396. http://geodesic.mathdoc.fr/item/VSPUI_2021_17_4_a6/

[1] Kjetsaa G., Attributed to Dostoevsky: The problem of attributing to Dostoevsky anonymous articles in Time and Epoch, Solum Forlag A. S. Publ, Oslo, 1986, 82 pp. | MR

[2] Batura T. V., “Formal methods for determining the authorship of texts”, Novosibirsk State University Bulletin. Series Information Technology, 10:4 (2012), 81–94 (In Russian)

[3] Lebedev A. A., Introduction to applied linguistics, Petrozavodsk State University Press, Petrozavodsk, 2019, 48 pp. (In Russian)

[4] Malyutov M. B., “Overview of methods and examples of text attribution”, Review of Applied and Industrial Mathematics. Moscow, 12:1 (2005), 41–78 (In Russian)

[5] Rogov A. A., Sedov A. V., Sidorov Y. V., Surovceva T. G., Mathematical methods for text attribution, Petrozavodsk State University Press, Petrozavodsk, 2014, 96 pp. (In Russian)

[6] Calle-Martin J., Miranda-Garcia A., “Stylometry and authorship attribution: Introduction to the special issue”, English Studies, 93:3 (2012), 251–258 | DOI

[7] Farringdon J. M., Analyzing for Authorship, University of Wales Press, Cardiff, 1996, 324 pp.

[8] Stamatatos E., “A survey of modern authorship attribution methods”, Journal of the American Society for Information Science and Technology, 60:3 (2009), 538–556 | DOI

[9] Kotov A. A., Mineeva Z. I., Rogov A. A., Sedov A. V., Sidorov Y. V., Linguistic corpuses, Petrozavodsk State University Press, Petrozavodsk, 2014, 140 pp. (In Russian)

[10] Sidorov G., Velasquez F., Stamatatos E., Gelbukh A., Chanona-Hernández L., “Syntactic n-grams as machine learning features for natural language processing”, Expert Systems with Applications, 41:3 (2014), 853–860 | DOI

[11] Rogov A. A., Kulakov K. A., Moskin N. D., “Software support in solving the problem of text attribution”, Software Engineering, 10:5 (2019), 234–240 (In Russian)

[12] Breiman L., “Random forests”, Machine Learning, 45:1 (2001), 5–32 | DOI | MR | Zbl

[13] Isert C., “The editing distance between trees”, Ferienakademie Bäume: Algorithmik und Kombinatorik [Holiday Academy Trees: Algorithmics and Combinatorics] (Sarntal, Italy, 1999), 1–13

[14] Shchegoleva L. V., Lebedev A. A., Moskin N. D., “Methods of data mining in the task of distinguishing between folklore and author's texts”, Questions of Linguistics, 2 (2020), 61–74 (In Russian)