NFDI4DS | UHH-SEMS - Publication Details

Stealing Part of a Production Language Model

FOS: Computer and information sciences Computer Science - Cryptography and Security Cryptography and Security (cs.CR)

DOI: 10.48550/arxiv.2403.06634 Publication Date: 2024-03-11

Abstract Supplemental Material References Cited by

AUTHORS (13)

Nicholas Carlini

Daniel Paleka

Krishnamurthy Dvi...

Thomas Steinke

Jonathan Hayase

A. Feder Cooper

Katherine Lee

Matthew Jagielski

Milad Nasr

Arthur Conmy

Eric Wallace

David Rolnick

Florian Tramèr

ABSTRACT

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our recovers embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, entire matrix Ada and Babbage models. thereby confirm, for time, these have hidden dimension 1024 2048, respectively. also recover exact size gpt-3.5-turbo estimate it would cost \$2,000 in queries matrix. conclude with potential defenses mitigations, discuss implications possible future work could extend attack.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Stealing Part of a Production Language Model

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....