NFDI4DS | UHH-SEMS - Publication Details

Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models

Code (set theory) RDF query language

DOI: 10.48550/arxiv.2502.09723 Publication Date: 2025-02-13

Abstract Supplemental Material References Cited by

AUTHORS (10)

Qingsong Zou

Jingyu Xiao

Qing Li

Yan Zhi

Yuhang Wang

Xu Li

Wenxuan Wang

Kuofeng Gao

Ruoyu Li

Yong Jiang

ABSTRACT

Recent advances in large language models (LLMs) have demonstrated remarkable potential the field of natural processing. Unfortunately, LLMs face significant security and ethical risks. Although techniques such as safety alignment are developed for defense, prior researches reveal possibility bypassing defenses through well-designed jailbreak attacks. In this paper, we propose QueryAttack, a novel framework to systematically examine generalizability alignment. By treating knowledge databases, translate malicious queries into code-style structured query bypass mechanisms LLMs. We conduct extensive experiments on mainstream LLMs, ant results show that QueryAttack achieves high attack success rates (ASRs) across with different developers capabilities. also evaluate QueryAttack's performance against common defenses, confirming it is difficult mitigate general defensive techniques. To defend tailor defense method which can reduce ASR by up 64\% GPT-4-1106. The code be found https://anonymous.4open.science/r/QueryAttack-334B.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....