Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models

Code (set theory) RDF query language
DOI: 10.48550/arxiv.2502.09723 Publication Date: 2025-02-13
ABSTRACT
Recent advances in large language models (LLMs) have demonstrated remarkable potential the field of natural processing. Unfortunately, LLMs face significant security and ethical risks. Although techniques such as safety alignment are developed for defense, prior researches reveal possibility bypassing defenses through well-designed jailbreak attacks. In this paper, we propose QueryAttack, a novel framework to systematically examine generalizability alignment. By treating knowledge databases, translate malicious queries into code-style structured query bypass mechanisms LLMs. We conduct extensive experiments on mainstream LLMs, ant results show that QueryAttack achieves high attack success rates (ASRs) across with different developers capabilities. also evaluate QueryAttack's performance against common defenses, confirming it is difficult mitigate general defensive techniques. To defend tailor defense method which can reduce ASR by up 64\% GPT-4-1106. The code be found https://anonymous.4open.science/r/QueryAttack-334B.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....