Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense

FOS: Computer and information sciences Computer Science - Cryptography and Security Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Cryptography and Security (cs.CR)
DOI: 10.48550/arxiv.2502.00840 Publication Date: 2025-02-02
ABSTRACT
Large Language Models (LLMs) have showcased remarkable capabilities across various domains. Accompanying the evolving and expanding deployment scenarios of LLMs, their challenges escalate due to sheer scale advanced yet complex activation designs prevalent in notable model series, such as Llama, Gemma, Mistral. These become particularly pronounced resource-constrained scenarios, where mitigating inference efficiency bottlenecks is imperative. Among recent efforts, approximation has emerged a promising avenue for pursuing efficiency, sometimes considered indispensable applications private inference. Despite achieving substantial speedups with minimal impact on utility, even appearing sound practical real-world deployment, safety implications approximations remain unclear. In this work, we fill critical gap LLM by conducting first systematic evaluation approximations. Our vetting spans seven sota techniques three popular categories, revealing consistent degradation ten safety-aligned LLMs.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....