NFDI4DS | UHH-SEMS - Publication Details

Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense

FOS: Computer and information sciences Computer Science - Cryptography and Security Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Cryptography and Security (cs.CR)

DOI: 10.48550/arxiv.2502.00840 Publication Date: 2025-02-02

Abstract Supplemental Material References Cited by

AUTHORS (10)

Jiawen Zhang

Kejia Chen

Lipeng He

Jian Lou

Dan Li

Zunlei Feng

Mingli Song

Jian Liu

Kui Ren

Xiaohu Yang

ABSTRACT

Large Language Models (LLMs) have showcased remarkable capabilities across various domains. Accompanying the evolving and expanding deployment scenarios of LLMs, their challenges escalate due to sheer scale advanced yet complex activation designs prevalent in notable model series, such as Llama, Gemma, Mistral. These become particularly pronounced resource-constrained scenarios, where mitigating inference efficiency bottlenecks is imperative. Among recent efforts, approximation has emerged a promising avenue for pursuing efficiency, sometimes considered indispensable applications private inference. Despite achieving substantial speedups with minimal impact on utility, even appearing sound practical real-world deployment, safety implications approximations remain unclear. In this work, we fill critical gap LLM by conducting first systematic evaluation approximations. Our vetting spans seven sota techniques three popular categories, revealing consistent degradation ten safety-aligned LLMs.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....