NFDI4DS | UHH-SEMS - Publication Details

Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity

Viewpoints Robustness

DOI: 10.48550/arxiv.2301.12867 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Terry Yue Zhuo

Yujin Huang

Chunyang Chen

Zhenchang Xing

ABSTRACT

Recent breakthroughs in natural language processing (NLP) have permitted the synthesis and comprehension of coherent text an open-ended way, therefore translating theoretical algorithms into practical applications. The large models (LLMs) significantly impacted businesses such as report summarization software copywriters. Observations indicate, however, that LLMs may exhibit social prejudice toxicity, posing ethical societal dangers consequences resulting from irresponsibility. Large-scale benchmarks for accountable should consequently be developed. Although several empirical investigations reveal existence a few difficulties advanced LLMs, there is little systematic examination user study risks harmful behaviors current LLM usage. To further educate future efforts on constructing responsibly, we perform qualitative research method called ``red teaming'' OpenAI's ChatGPT\footnote{In this paper, ChatGPT refers to version released Dec 15th.} better understand features recent LLMs. We analyze comprehensively four perspectives: 1) \textit{Bias} 2) \textit{Reliability} 3) \textit{Robustness} 4) \textit{Toxicity}. In accordance with our stated viewpoints, empirically benchmark multiple sample datasets. find significant number cannot addressed by existing benchmarks, hence illustrate them via additional case studies. addition, examine implications findings AI ethics harmal ChatGPT, well problems design considerations responsible believe give light determine mitigate hazards posed machines

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....