HotFlip: White-Box Adversarial Examples for Text Classification

White box
DOI: 10.18653/v1/p18-2006 Publication Date: 2019-06-29T15:52:10Z
ABSTRACT
We propose an efficient method to generate white-box adversarial examples trick a character-level neural classifier. find that only few manipulations are needed greatly decrease the accuracy. Our relies on atomic flip operation, which swaps one token for another, based gradients of one-hot input vectors. Due efficiency our method, we can perform training makes model more robust attacks at test time. With use semantics-preserving constraints, demonstrate HotFlip be adapted attack word-level classifier as well.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (324)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....