State Soup: In-Context Skill Learning, Retrieval and Mixing

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2406.08423 Publication Date: 2024-06-12
ABSTRACT
A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range sequence modeling problems. Such models naturally handle long sequences efficiently, as the cost processing input is independent length. Here, we explore another advantage these stateful models, inspired by success model merging through parameter interpolation. Building parallels between fine-tuning and in-context learning, investigate whether can treat internal states task vectors that be stored, retrieved, then linearly combined, exploiting linearity recurrence. We study this form fast Mamba-2.8b, pretrained model, present preliminary evidence simple linear state interpolation methods suffice to improve next-token perplexity well downstream learning performance.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....