Adversarial Counterfactual Visual Explanations

Robustification
DOI: 10.48550/arxiv.2303.09962 Publication Date: 2023-01-01
ABSTRACT
Counterfactual explanations and adversarial attacks have a related goal: flipping output labels with minimal perturbations regardless of their characteristics. Yet, cannot be used directly in counterfactual explanation perspective, as such are perceived noise not actionable understandable image modifications. Building on the robust learning literature, this paper proposes an elegant method to turn into semantically meaningful perturbations, without modifying classifiers explain. The proposed approach hypothesizes that Denoising Diffusion Probabilistic Models excellent regularizers for avoiding high-frequency out-of-distribution when generating attacks. paper's key idea is build through diffusion model polish them. This allows studying target its robustification level. Extensive experimentation shows advantages our over current State-of-the-Art multiple testbeds.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....