Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

FOS: Computer and information sciences Computer Science - Computation and Language 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Computation and Language (cs.CL)
DOI: 10.18653/v1/2021.eacl-main.13 Publication Date: 2021-10-20T07:16:31Z
ABSTRACT
Recent efforts have shown that neural text processing models are vulnerable to adversarial examples, but the nature of these examples is poorly understood. In this work, we show attacks against CNN, LSTM and Transformer-based classification perform word substitutions identifiable through frequency differences between replaced words their corresponding substitutions. Based on findings, propose frequency-guided (FGWS), a simple algorithm exploiting properties for detection examples. FGWS achieves strong performance by accurately detecting SST-2 IMDb sentiment datasets, with F1 scores up 91.4% RoBERTa-based models. We compare our approach recently proposed perturbation discrimination framework outperform it 13.0% F1.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (26)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....