On Retrieval Augmentation and the Limitations of Language Model Training

FOS: Computer and information sciences Computer Science - Computation and Language Computation and Language (cs.CL)
DOI: 10.48550/arxiv.2311.09615 Publication Date: 2023-01-01
ABSTRACT
Augmenting a language model (LM) with $k$-nearest neighbors ($k$NN) retrieval on its training data alone can decrease perplexity, though the underlying reasons for this remain elusive. In work, we rule out one previously posited possibility -- "softmax bottleneck." We then create new dataset to evaluate LM generalization ability in setting where contains additional information that is not causally relevant. This task challenging even GPT-3.5 Turbo. show that, both GPT-2 and Mistral 7B, $k$NN augmentation consistently improves performance setting. Finally, make more accessible, propose using multi-layer perceptron maps datastore keys values as drop-in replacement traditional retrieval. reduces storage costs by over 25x.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....