NFDI4DS | UHH-SEMS - Publication Details

Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

FOS: Computer and information sciences Computer Science - Computation and Language 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Computation and Language (cs.CL)

DOI: 10.18653/v1/2021.eacl-main.13 Publication Date: 2021-10-20T07:16:31Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Maximilian Mozes

Pontus Stenetorp

Bennett Kleinberg

Lewis Griffin

ABSTRACT

Recent efforts have shown that neural text processing models are vulnerable to adversarial examples, but the nature of these examples is poorly understood. In this work, we show attacks against CNN, LSTM and Transformer-based classification perform word substitutions identifiable through frequency differences between replaced words their corresponding substitutions. Based on findings, propose frequency-guided (FGWS), a simple algorithm exploiting properties for detection examples. FGWS achieves strong performance by accurately detecting SST-2 IMDb sentiment datasets, with F1 scores up 91.4% RoBERTa-based models. We compare our approach recently proposed perturbation discrimination framework outperform it 13.0% F1.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (26)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....