State Soup: In-Context Skill Learning, Retrieval and Mixing
FOS: Computer and information sciences
Computer Science - Machine Learning
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.2406.08423
Publication Date:
2024-06-12
AUTHORS (5)
ABSTRACT
A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range sequence modeling problems. Such models naturally handle long sequences efficiently, as the cost processing input is independent length. Here, we explore another advantage these stateful models, inspired by success model merging through parameter interpolation. Building parallels between fine-tuning and in-context learning, investigate whether can treat internal states task vectors that be stored, retrieved, then linearly combined, exploiting linearity recurrence. We study this form fast Mamba-2.8b, pretrained model, present preliminary evidence simple linear state interpolation methods suffice to improve next-token perplexity well downstream learning performance.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....