Annotation of text corpora by sentiment and presence of irony within a project of citizen science
Modelirovanie i analiz informacionnyh sistem, Tome 30 (2023) no. 1, pp. 86-100

Voir la notice de l'article provenant de la source Math-Net.Ru

The paper is devoted to construction of a sentence corpus annotated by the general sentiment into 4 classes (positive, negative, neutral, and mixed), a corpus of phrasemes annotated by the sentiment into 3 classes (positive, negative, and neutral), and a corpus of sentences annotated by the presence or absence of irony. The annotation was done by volunteers within the project “Prepare texts for algorithms” on the portal “People of science”. The existing knowledge on the domain regarding each task was the basis to develop guidelines for annotators. A technique of statistical analysis of the annotation result based on the distributions and agreement measures of the annotations performed by various annotators was also developed. For the annotation of sentences by irony and phrasemes by the sentiment the agreement measures were rather high (the full agreement rate of 0.60–0.99), whereas for the annotation of sentences by the general sentiment the agreement was low (the full agreement rate of 0.40), presumably, due to the higher complexity of the task. It was also shown that the results of automatic algorithms of detecting the sentiment of sentences improved by 12–13 % when using a corpus for which all the annotators (from 3 till 5) had the agreement, in comparison with a corpus annotated by only one volunteer.
Keywords: sentiment analysis, text corpus, statistical analysis, agreement measures, citizen science.
@article{MAIS_2023_30_1_a5,
     author = {I. V. Paramonov and A. Yu. Poletaev},
     title = {Annotation of text corpora by sentiment and presence of irony within a project of citizen science},
     journal = {Modelirovanie i analiz informacionnyh sistem},
     pages = {86--100},
     publisher = {mathdoc},
     volume = {30},
     number = {1},
     year = {2023},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MAIS_2023_30_1_a5/}
}
TY  - JOUR
AU  - I. V. Paramonov
AU  - A. Yu. Poletaev
TI  - Annotation of text corpora by sentiment and presence of irony within a project of citizen science
JO  - Modelirovanie i analiz informacionnyh sistem
PY  - 2023
SP  - 86
EP  - 100
VL  - 30
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MAIS_2023_30_1_a5/
LA  - ru
ID  - MAIS_2023_30_1_a5
ER  - 
%0 Journal Article
%A I. V. Paramonov
%A A. Yu. Poletaev
%T Annotation of text corpora by sentiment and presence of irony within a project of citizen science
%J Modelirovanie i analiz informacionnyh sistem
%D 2023
%P 86-100
%V 30
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MAIS_2023_30_1_a5/
%G ru
%F MAIS_2023_30_1_a5
I. V. Paramonov; A. Yu. Poletaev. Annotation of text corpora by sentiment and presence of irony within a project of citizen science. Modelirovanie i analiz informacionnyh sistem, Tome 30 (2023) no. 1, pp. 86-100. http://geodesic.mathdoc.fr/item/MAIS_2023_30_1_a5/