Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity

Viewpoints Robustness
DOI: 10.48550/arxiv.2301.12867 Publication Date: 2023-01-01
ABSTRACT
Recent breakthroughs in natural language processing (NLP) have permitted the synthesis and comprehension of coherent text an open-ended way, therefore translating theoretical algorithms into practical applications. The large models (LLMs) significantly impacted businesses such as report summarization software copywriters. Observations indicate, however, that LLMs may exhibit social prejudice toxicity, posing ethical societal dangers consequences resulting from irresponsibility. Large-scale benchmarks for accountable should consequently be developed. Although several empirical investigations reveal existence a few difficulties advanced LLMs, there is little systematic examination user study risks harmful behaviors current LLM usage. To further educate future efforts on constructing responsibly, we perform qualitative research method called ``red teaming'' OpenAI's ChatGPT\footnote{In this paper, ChatGPT refers to version released Dec 15th.} better understand features recent LLMs. We analyze comprehensively four perspectives: 1) \textit{Bias} 2) \textit{Reliability} 3) \textit{Robustness} 4) \textit{Toxicity}. In accordance with our stated viewpoints, empirically benchmark multiple sample datasets. find significant number cannot addressed by existing benchmarks, hence illustrate them via additional case studies. addition, examine implications findings AI ethics harmal ChatGPT, well problems design considerations responsible believe give light determine mitigate hazards posed machines
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....