NFDI4DS | UHH-SEMS - Publication Details

In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT

Affect

DOI: 10.48550/arxiv.2304.08979 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Xinyue Shen

Zeyuan Chen

Michael Backes

Yang Zhang

ABSTRACT

The way users acquire information is undergoing a paradigm shift with the advent of ChatGPT. Unlike conventional search engines, ChatGPT retrieves knowledge from model itself and generates answers for users. ChatGPT's impressive question-answering (QA) capability has attracted more than 100 million within short period time but also raised concerns regarding its reliability. In this paper, we perform first large-scale measurement reliability in generic QA scenario carefully curated set 5,695 questions across ten datasets eight domains. We find that varies different domains, especially underperforming law science questions. demonstrate system roles, originally designed by OpenAI to allow steer behavior, can impact an imperceptible way. further show vulnerable adversarial examples, even single character change negatively affect certain cases. believe our study provides valuable insights into underscores need strengthening security large language models (LLMs).

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....