NFDI4DS | UHH-SEMS - Publication Details

MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots

Realm Chatbot

DOI: 10.48550/arxiv.2307.08715 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (9)

Gelei Deng

Yi Liu

Yuekang Li

Kailong Wang

Ying Zhang

Zefeng Li

Haoyu Wang

Tianwei Zhang

Yang Liu

ABSTRACT

Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI) services due to their exceptional proficiency in understanding and generating human-like text. LLM chatbots, particular, seen widespread adoption, transforming human-machine interactions. However, these chatbots are susceptible "jailbreak" attacks, where malicious users manipulate prompts elicit inappropriate or sensitive responses, contravening service policies. Despite existing attempts mitigate such threats, our research reveals a substantial gap of vulnerabilities, largely the undisclosed defensive measures implemented by providers. In this paper, we present Jailbreaker, comprehensive framework that offers an in-depth jailbreak attacks countermeasures. Our work makes dual contribution. First, propose innovative methodology inspired time-based SQL injection techniques reverse-engineer strategies prominent as ChatGPT, Bard, Bing Chat. This time-sensitive approach uncovers intricate details about services' defenses, facilitating proof-of-concept attack successfully bypasses mechanisms. Second, introduce automatic generation method for prompts. Leveraging fine-tuned LLM, validate potential automated across various commercial chatbots. achieves promising average success rate 21.58%, significantly outperforming effectiveness techniques. We responsibly disclosed findings concerned providers, underscoring urgent need more robust defenses. Jailbreaker thus marks significant step towards mitigating threats realm

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....