Word-based russian text augmentation for character-level models
Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part I, Tome 499 (2021), pp. 206-221

Voir la notice de l'article provenant de la source Math-Net.Ru

Large-scale deep learning models, including models for natural language processing, require large datasets for training that could be unavailable for low-resource languages or for special domains. We consider a way to approach the problem of poor variability and small size of available data for training NLP models based on augmenting the data with synonyms. We design a novel augmentation scheme that includes replacing words with synonyms and reshuffling the words, apply it to the Russian language, and report improved results for the sentiment analysis task.
@article{ZNSL_2021_499_a10,
     author = {R. B. Galinsky and A. M. Alekseev and S. I. Nikolenko},
     title = {Word-based russian text augmentation for character-level models},
     journal = {Zapiski Nauchnykh Seminarov POMI},
     pages = {206--221},
     publisher = {mathdoc},
     volume = {499},
     year = {2021},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ZNSL_2021_499_a10/}
}
TY  - JOUR
AU  - R. B. Galinsky
AU  - A. M. Alekseev
AU  - S. I. Nikolenko
TI  - Word-based russian text augmentation for character-level models
JO  - Zapiski Nauchnykh Seminarov POMI
PY  - 2021
SP  - 206
EP  - 221
VL  - 499
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/ZNSL_2021_499_a10/
LA  - en
ID  - ZNSL_2021_499_a10
ER  - 
%0 Journal Article
%A R. B. Galinsky
%A A. M. Alekseev
%A S. I. Nikolenko
%T Word-based russian text augmentation for character-level models
%J Zapiski Nauchnykh Seminarov POMI
%D 2021
%P 206-221
%V 499
%I mathdoc
%U http://geodesic.mathdoc.fr/item/ZNSL_2021_499_a10/
%G en
%F ZNSL_2021_499_a10
R. B. Galinsky; A. M. Alekseev; S. I. Nikolenko. Word-based russian text augmentation for character-level models. Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part I, Tome 499 (2021), pp. 206-221. http://geodesic.mathdoc.fr/item/ZNSL_2021_499_a10/