Adversarial Training with Fast Gradient Projection Method against Synonym Substitution Based Text Attacks

Synonym (taxonomy) Substitution (logic) Robustness
DOI: 10.1609/aaai.v35i16.17648 Publication Date: 2022-09-08T20:13:22Z
ABSTRACT
Adversarial training is the most empirically successful approach in improving robustness of deep neural networks for image classification. For text classification, however, existing synonym substitution based adversarial attacks are effective but not very efficient to be incorporated into practical training. Gradient-based attacks, which images, hard implemented due lexical, grammatical and semantic constraints discrete input space. Thereby, we propose a fast attack method called Fast Gradient Projection Method (FGPM) on substitution, about 20 times faster than methods could achieve similar performance. We then incorporate FGPM with defense Training enhanced by Logit pairing (ATFL). Experiments show that ATFL significantly improve model block transferability examples.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (28)