Stealing Part of a Production Language Model
FOS: Computer and information sciences
Computer Science - Cryptography and Security
Cryptography and Security (cs.CR)
DOI:
10.48550/arxiv.2403.06634
Publication Date:
2024-03-11
AUTHORS (13)
ABSTRACT
We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our recovers embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, entire matrix Ada and Babbage models. thereby confirm, for time, these have hidden dimension 1024 2048, respectively. also recover exact size gpt-3.5-turbo estimate it would cost \$2,000 in queries matrix. conclude with potential defenses mitigations, discuss implications possible future work could extend attack.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....