In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT

Affect
DOI: 10.48550/arxiv.2304.08979 Publication Date: 2023-01-01
ABSTRACT
The way users acquire information is undergoing a paradigm shift with the advent of ChatGPT. Unlike conventional search engines, ChatGPT retrieves knowledge from model itself and generates answers for users. ChatGPT's impressive question-answering (QA) capability has attracted more than 100 million within short period time but also raised concerns regarding its reliability. In this paper, we perform first large-scale measurement reliability in generic QA scenario carefully curated set 5,695 questions across ten datasets eight domains. We find that varies different domains, especially underperforming law science questions. demonstrate system roles, originally designed by OpenAI to allow steer behavior, can impact an imperceptible way. further show vulnerable adversarial examples, even single character change negatively affect certain cases. believe our study provides valuable insights into underscores need strengthening security large language models (LLMs).
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....