AutoZOOM: Autoencoder-Based Zeroth Order Optimization Method for Attacking Black-Box Neural Networks

MNIST database Autoencoder Robustness Black box Speedup Deep Neural Networks
DOI: 10.1609/aaai.v33i01.3301742 Publication Date: 2019-09-13T22:04:31Z
ABSTRACT
Recent studies have shown that adversarial examples in state-of-the-art image classifiers trained by deep neural networks (DNN) can be easily generated when the target model is transparent to an attacker, known as white-box setting. However, attacking a deployed machine learning service, one only acquire input-output correspondences of model; this so-called black-box attack The major drawback existing attacks need for excessive queries, which may give false sense robustness due inefficient query designs. To bridge gap, we propose generic framework query-efficient blackbox attacks. Our framework, AutoZOOM, short Autoencoder-based Zeroth Order Optimization Method, has two novel building blocks towards efficient attacks: (i) adaptive random gradient estimation strategy balance counts and distortion, (ii) autoencoder either offline with unlabeled data or bilinear resizing operation acceleration. Experimental results suggest that, applying AutoZOOM (ZOO), significant reduction queries achieved without sacrificing success rate visual quality resulting examples. In particular, compared standard ZOO method, consistently reduce mean finding successful (or reaching same distortion level) at least 93% on MNIST, CIFAR-10 ImageNet datasets, leading insights robustness.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (199)