MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
Realm
Chatbot
DOI:
10.48550/arxiv.2307.08715
Publication Date:
2023-01-01
AUTHORS (9)
ABSTRACT
Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI) services due to their exceptional proficiency in understanding and generating human-like text. LLM chatbots, particular, seen widespread adoption, transforming human-machine interactions. However, these chatbots are susceptible "jailbreak" attacks, where malicious users manipulate prompts elicit inappropriate or sensitive responses, contravening service policies. Despite existing attempts mitigate such threats, our research reveals a substantial gap of vulnerabilities, largely the undisclosed defensive measures implemented by providers. In this paper, we present Jailbreaker, comprehensive framework that offers an in-depth jailbreak attacks countermeasures. Our work makes dual contribution. First, propose innovative methodology inspired time-based SQL injection techniques reverse-engineer strategies prominent as ChatGPT, Bard, Bing Chat. This time-sensitive approach uncovers intricate details about services' defenses, facilitating proof-of-concept attack successfully bypasses mechanisms. Second, introduce automatic generation method for prompts. Leveraging fine-tuned LLM, validate potential automated across various commercial chatbots. achieves promising average success rate 21.58%, significantly outperforming effectiveness techniques. We responsibly disclosed findings concerned providers, underscoring urgent need more robust defenses. Jailbreaker thus marks significant step towards mitigating threats realm
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....