NFDI4DS | UHH-SEMS - Publication Details

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Cryptography and Security Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Cryptography and Security (cs.CR) Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2405.13068 Publication Date: 2024-05-20

Abstract Supplemental Material References Cited by

AUTHORS (7)

Yuxi Li

Yi Liu

Yuekang Li

Ling Shi

Gelei Deng

Shengquan Chen

Kailong Wang

ABSTRACT

Large language models (LLMs) have transformed the field of natural processing, but they remain susceptible to jailbreaking attacks that exploit their capabilities generate unintended and potentially harmful content. Existing token-level techniques, while effective, face scalability efficiency challenges, especially as undergo frequent updates incorporate advanced defensive measures. In this paper, we introduce JailMine, an innovative manipulation approach addresses these limitations effectively. JailMine employs automated "mining" process elicit malicious responses from LLMs by strategically selecting affirmative outputs iteratively reducing likelihood rejection. Through rigorous testing across multiple well-known datasets, demonstrate JailMine's effectiveness efficiency, achieving a significant average reduction 86% in time consumed maintaining high success rates averaging 95%, even evolving strategies. Our work contributes ongoing effort assess mitigate vulnerability attacks, underscoring importance continued vigilance proactive measures enhance security reliability powerful models.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....