Word-based russian text augmentation for character-level models
Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part I, Tome 499 (2021), pp. 206-221
Voir la notice de l'article provenant de la source Math-Net.Ru
Large-scale deep learning models, including models for natural language processing, require large datasets for training that could be unavailable for low-resource languages or for special domains. We consider a way to approach the problem of poor variability and small size of available data for training NLP models based on augmenting the data with synonyms. We design a novel augmentation scheme that includes replacing words with synonyms and reshuffling the words, apply it to the Russian language, and report improved results for the sentiment analysis task.
@article{ZNSL_2021_499_a10,
author = {R. B. Galinsky and A. M. Alekseev and S. I. Nikolenko},
title = {Word-based russian text augmentation for character-level models},
journal = {Zapiski Nauchnykh Seminarov POMI},
pages = {206--221},
publisher = {mathdoc},
volume = {499},
year = {2021},
language = {en},
url = {http://geodesic.mathdoc.fr/item/ZNSL_2021_499_a10/}
}
TY - JOUR AU - R. B. Galinsky AU - A. M. Alekseev AU - S. I. Nikolenko TI - Word-based russian text augmentation for character-level models JO - Zapiski Nauchnykh Seminarov POMI PY - 2021 SP - 206 EP - 221 VL - 499 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/ZNSL_2021_499_a10/ LA - en ID - ZNSL_2021_499_a10 ER -
R. B. Galinsky; A. M. Alekseev; S. I. Nikolenko. Word-based russian text augmentation for character-level models. Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part I, Tome 499 (2021), pp. 206-221. http://geodesic.mathdoc.fr/item/ZNSL_2021_499_a10/