On Retrieval Augmentation and the Limitations of Language Model Training
FOS: Computer and information sciences
Computer Science - Computation and Language
Computation and Language (cs.CL)
DOI:
10.48550/arxiv.2311.09615
Publication Date:
2023-01-01
AUTHORS (6)
ABSTRACT
Augmenting a language model (LM) with $k$-nearest neighbors ($k$NN) retrieval on its training data alone can decrease perplexity, though the underlying reasons for this remain elusive. In work, we rule out one previously posited possibility -- "softmax bottleneck." We then create new dataset to evaluate LM generalization ability in setting where contains additional information that is not causally relevant. This task challenging even GPT-3.5 Turbo. show that, both GPT-2 and Mistral 7B, $k$NN augmentation consistently improves performance setting. Finally, make more accessible, propose using multi-layer perceptron maps datastore keys values as drop-in replacement traditional retrieval. reduces storage costs by over 25x.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....