Analysis of the impact of the stylometric characteristics of different levels for the verification of authors of the prose
Modelirovanie i analiz informacionnyh sistem, Tome 28 (2021) no. 3, pp. 260-279.

Voir la notice de l'article provenant de la source Math-Net.Ru

This article is dedicated to the analysis of various stylometric characteristics combinations of different levels for the quality of verification of authorship of Russian, English and French prose texts. The research was carried out for both low-level stylometric characteristics based on words and symbols and higher-level structural characteristics. All stylometric characteristics were calculated automatically with the help of the ProseRhythmDetector program. This approach gave a possibility to analyze the works of a large volume and of many writers at the same time. During the work, vectors of stylometric characteristics of the level of symbols, words and structure were compared to each text. During the experiments, the sets of parameters of these three levels were combined with each other in all possible ways. The resulting vectors of stylometric characteristics were applied to the input of various classifiers to perform verification and identify the most appropriate classifier for solving the problem. The best results were obtained with the help of the AdaBoost classifier. The average F-score for all languages turned out to be more than 92 %. Detailed assessments of the quality of verification are given and analyzed for each author. Use of high-level stylometric characteristics, in particular, frequency of using N-grams of POS tags, offers the prospect of a more detailed analysis of the style of one or another author. The results of the experiments show that when the characteristics of the structure level are combined with the characteristics of the level of words and / or symbols, the most accurate results of verification of authorship for literary texts in Russian, English and French are obtained. Additionally, the authors were able to conclude about a different degree of impact of stylometric characteristics for the quality of verification of authorship for different languages.
Keywords: stylometry, stylometric characteristics, authorship verification, natural language processing.
@article{MAIS_2021_28_3_a4,
     author = {A. M. Manakhova and N. S. Lagutina},
     title = {Analysis of the impact of the stylometric characteristics of different levels for the verification of authors of the prose},
     journal = {Modelirovanie i analiz informacionnyh sistem},
     pages = {260--279},
     publisher = {mathdoc},
     volume = {28},
     number = {3},
     year = {2021},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MAIS_2021_28_3_a4/}
}
TY  - JOUR
AU  - A. M. Manakhova
AU  - N. S. Lagutina
TI  - Analysis of the impact of the stylometric characteristics of different levels for the verification of authors of the prose
JO  - Modelirovanie i analiz informacionnyh sistem
PY  - 2021
SP  - 260
EP  - 279
VL  - 28
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MAIS_2021_28_3_a4/
LA  - ru
ID  - MAIS_2021_28_3_a4
ER  - 
%0 Journal Article
%A A. M. Manakhova
%A N. S. Lagutina
%T Analysis of the impact of the stylometric characteristics of different levels for the verification of authors of the prose
%J Modelirovanie i analiz informacionnyh sistem
%D 2021
%P 260-279
%V 28
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MAIS_2021_28_3_a4/
%G ru
%F MAIS_2021_28_3_a4
A. M. Manakhova; N. S. Lagutina. Analysis of the impact of the stylometric characteristics of different levels for the verification of authors of the prose. Modelirovanie i analiz informacionnyh sistem, Tome 28 (2021) no. 3, pp. 260-279. http://geodesic.mathdoc.fr/item/MAIS_2021_28_3_a4/

[1] N. P. Tuchkova, O. M. Ataeva, “Podhody k izvlecheniyu znanij v nauchnyh predmetnyh oblastyah”, Informacionnye i matematicheskie tekhnologii v nauke i upravlenii, 2020, no. 2 (18), 5–18

[2] A. Altamimi, N. Clarke, S. Furnell, F. Li, “Multi-platform authorship verification”, Proceedings of the third central european cybersecurity conference, 2019, 1–7

[3] O. Halvani, L. Graner, R. Regev, “Taveer: an interpretable topic-agnostic authorship verification method”, Proceedings of the 15th international conference on availability, reliability and security, 2020, 1–10

[4] M. Kestemont, G. Martens, T. Ries, “A computational approach to authorship verification of Johann Wolfgang Goethe's contributions to the Frankfurter gelehrte anzeigen (1772–73)”, Journal of European Periodical Studies, 4:1 (2019), 115–143 | DOI

[5] S. Corbara, A. Moreo, F. Sebastiani, M. Tavoni, “The epistle to cangrande through the lens of computational authorship verification”, International conference on image analysis and processing, Springer, 2019, 148–158

[6] V. A. Drozdov, “Ob avtorstve poemy «'Ushshak-name» s tochki zreniya akademicheskogo vostokovedeniya i novejshih komp'yuternyh tekhnologij”, Orientalistika, 3:5 (2020), 1360–1378 | DOI

[7] M. Kestemont, E. Manjavacas, I. Markov, J. Bevendorff, M. Wiegmann, E. Stamatatos, M. Potthast, B. Stein, “Overview of the cross-domain authorship verification task at pan 2020”, CLEF, 2020

[8] N. Potha, E. Stamatatos, “Intrinsic author verification using topic modeling”, Proceedings of the 10th hellenic conference on artificial intelligence, ACM, 2018, 1–7

[9] S. Adamovic, V. Miskovic, M. Milosavljevic, M. Sarac, M. Veinovic, “Automated language-independent authorship verification (for indo-european languages)”, Journal of the Association for Information Science and Technology, 70:8 (2019), 858–871 | DOI

[10] B. Boenninghoff, S. Hessler, D. Kolossa, R. M. Nickel, “Explainable authorship verification in social media via attention-based similarity learning”, 2019 IEEE international conference on big data (big data), IEEE, 2019, 36–45 | DOI

[11] N. E. Benzebouchi, N. Azizi, M. Aldwairi, N. Farah, “Multi-classifier system for authorship verification task using word embeddings”, 2018 2nd international conference on natural language and speech processing (ICNLSP), IEEE, 2018, 1–6

[12] J. S. Li, L. Chen, J. V. Monaco, P. Singh, C. C. Tappert, “A comparison of classifiers and features for authorship authentication of social networking messages”, Concurrency and Computation: Practice and Experience, 29:14 (2017), e3918

[13] E. Tuccinardi, “An application of a profile-based method for authorship verification: investigating the authenticity of pliny the younger's letter to trajan concerning the christians”, Digital Scholarship in the Humanities, 32:2 (2017), 435–447

[14] P. B. Reddy, T. M. Mohan, P. V. K. Raja, T. R. Reddy, “A novel approach for authorship verification”, Data engineering and communication technology, Springer, 2020, 441–448 | DOI

[15] E. Castillo, O. Cervantes, D. Vilarino, “Authorship verification using a graph knowledge discovery approach”, Journal of Intelligent Fuzzy Systems, 36:6 (2019), 6075–6087 | DOI

[16] H. Ahmed, “The role of linguistic feature categories in authorship verification”, Procedia computer science, 142 (2018), 214–221 | DOI

[17] M. A. Al-Khatib, J. K. Al-qaoud, “Authorship verification of opinion articles in online newspapers using the idiolect of author: a comparative study”, Information, Communication Society, Taylor Francis, 2020, 1–19

[18] K. Lagutina, N. Lagutina, E. Boychuk, I. Vorontsova, E. Shliakhtina, O. Belyaeva, I. Paramonov, “A survey on stylometric text features”, Proceedings of the 25th conference of open innovations association, FRUCT, IEEE, 2019, 184–195

[19] Y. Polin, T. Zudilova, I. Ananchenko, T. Vojtyuk, “Derevya reshenij v zadachah klassifikacii: osobennosti primeneniya i metody povysheniya kachestva klassifikacii”, Sovremennye naukoemkie tekhnologii, 2020, no. 9, 59–63

[20] B. Xu, X. Guo, Y. Ye, J. Cheng, “An improved random forest classifier for text categorization”, JCP, 7:12 (2012), 2913–2920

[21] S. Kim, K. Han, H. Rim, S. H. Myaeng, “Some effective techniques for naive bayes text classification”, IEEE transactions on knowledge and data engineering, 18:11 (2006), 1457–1466 | DOI

[22] K. Lagutina, A. Poletaev, N. Lagutina, E. Boychuk, I. Paramonov, “Automatic extraction of rhythm figures and analysis of their dynamics in prose of 19th-21st centuries”, Proceedings of the 26th conference of open innovations association, FRUCT, IEEE, 2020, 247–255