SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
Robustness
DOI:
10.48550/arxiv.2402.03317
Publication Date:
2024-01-02
AUTHORS (6)
ABSTRACT
Vision Transformers (ViTs) have gained prominence as a preferred choice for wide range of computer vision tasks due to their exceptional performance. However, widespread adoption has raised concerns about security in the face malicious attacks. Most existing methods rely on empirical adjustments during training process, lacking clear theoretical foundation. In this study, we address gap by introducing SpecFormer, specifically designed enhance ViTs' resilience against adversarial attacks, with support from carefully derived guarantees. We establish local Lipschitz bounds self-attention layer and introduce novel approach, Maximum Singular Value Penalization (MSVP), attain precise control over these bounds. seamlessly integrate MSVP into attention layers, using power iteration method enhanced computational efficiency. The modified model, effectively reduces spectral norms weight matrices, thereby enhancing network Lipschitzness. This, turn, leads improved efficiency robustness. Extensive experiments CIFAR ImageNet datasets confirm SpecFormer's superior performance defending
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....