NFDI4DS | UHH-SEMS - Publication Details

State Soup: In-Context Skill Learning, Retrieval and Mixing

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2406.08423 Publication Date: 2024-06-12

Abstract Supplemental Material References Cited by

AUTHORS (5)

Maciej Pióro

Maciej Wołczyk

Razvan Pascanu

Johannes von Oswald

João Sacramento

ABSTRACT

A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range sequence modeling problems. Such models naturally handle long sequences efficiently, as the cost processing input is independent length. Here, we explore another advantage these stateful models, inspired by success model merging through parameter interpolation. Building parallels between fine-tuning and in-context learning, investigate whether can treat internal states task vectors that be stored, retrieved, then linearly combined, exploiting linearity recurrence. We study this form fast Mamba-2.8b, pretrained model, present preliminary evidence simple linear state interpolation methods suffice to improve next-token perplexity well downstream learning performance.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

State Soup: In-Context Skill Learning, Retrieval and Mixing

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....