Adversarial attacks on language models: WordPiece filtration and ChatGPT synonyms
Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part II–2, Tome 530 (2023), pp. 80-95
Voir la notice de l'article provenant de la source Math-Net.Ru
Adversarial attacks on text have gained significant attention in recent years due to their potential to undermine the reliability of NLP models. We present novel black-box character- and word-level adversarial example generation approaches applicable to BERT-based models. The character-level approach is based on the idea of adding natural typos into a word according to its WordPiece tokenization. As for word-level approaches, we present three techniques that make use of synonymous substitute words created by ChatGPT and post-corrected to be in the appropriate grammatical form for the given context. Additionally, we try to minimize the perturbation rate taking into account the damage that each perturbation does to the model. By combining character-level approaches, word-level approaches, and the perturbation rate minimization technique, we achieve a state of the art attack rate. Our best approach works 30-65% faster than the previously best method, Tampers, and has a comparable perturbation rate. At the same time, proposed perturbations retain the semantic similarity between the original and adversarial examples and achieve a relatively low value of Levenshtein distance.
@article{ZNSL_2023_530_a6,
author = {T. Ter-Hovhannisyan and H. Aleksanyan and K. Avetisyan},
title = {Adversarial attacks on language models: {WordPiece} filtration and {ChatGPT} synonyms},
journal = {Zapiski Nauchnykh Seminarov POMI},
pages = {80--95},
publisher = {mathdoc},
volume = {530},
year = {2023},
language = {en},
url = {http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a6/}
}
TY - JOUR AU - T. Ter-Hovhannisyan AU - H. Aleksanyan AU - K. Avetisyan TI - Adversarial attacks on language models: WordPiece filtration and ChatGPT synonyms JO - Zapiski Nauchnykh Seminarov POMI PY - 2023 SP - 80 EP - 95 VL - 530 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a6/ LA - en ID - ZNSL_2023_530_a6 ER -
%0 Journal Article %A T. Ter-Hovhannisyan %A H. Aleksanyan %A K. Avetisyan %T Adversarial attacks on language models: WordPiece filtration and ChatGPT synonyms %J Zapiski Nauchnykh Seminarov POMI %D 2023 %P 80-95 %V 530 %I mathdoc %U http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a6/ %G en %F ZNSL_2023_530_a6
T. Ter-Hovhannisyan; H. Aleksanyan; K. Avetisyan. Adversarial attacks on language models: WordPiece filtration and ChatGPT synonyms. Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part II–2, Tome 530 (2023), pp. 80-95. http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a6/