Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples
FOS: Computer and information sciences
Computer Science - Computation and Language
0202 electrical engineering, electronic engineering, information engineering
02 engineering and technology
Computation and Language (cs.CL)
DOI:
10.18653/v1/2021.eacl-main.13
Publication Date:
2021-10-20T07:16:31Z
AUTHORS (4)
ABSTRACT
Recent efforts have shown that neural text processing models are vulnerable to adversarial examples, but the nature of these examples is poorly understood. In this work, we show attacks against CNN, LSTM and Transformer-based classification perform word substitutions identifiable through frequency differences between replaced words their corresponding substitutions. Based on findings, propose frequency-guided (FGWS), a simple algorithm exploiting properties for detection examples. FGWS achieves strong performance by accurately detecting SST-2 IMDb sentiment datasets, with F1 scores up 91.4% RoBERTa-based models. We compare our approach recently proposed perturbation discrimination framework outperform it 13.0% F1.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (26)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....