NFDI4DS | UHH-SEMS - Publication Details

Extending LLMs' Context Window with 100 Samples

DOI: 10.48550/arxiv.2401.07004 Publication Date: 2024-01-01

Abstract Supplemental Material References Cited by

AUTHORS (3)

Yikai Zhang

Junlong Li

Pengfei Liu

ABSTRACT

Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window, constraining application in downstream tasks with lengthy inputs. Recent studies sought extend LLMs' window by modifying rotary position embedding (RoPE), a popular encoding method adopted well-known LLMs such as LLaMA, PaLM, and GPT-NeoX. However, prior works like Position Interpolation (PI) YaRN resource-intensive lack comparative experiments assess applicability. In this work, we identify the inherent need for attention entropy (i.e. information of scores) maintain stability introduce novel extension RoPE which combines adjusting RoPE's base frequency scaling logits help efficiently adapt larger window. We validate superiority our both fine-tuning performance robustness across different sizes on various context-demanding tasks. Notably, extends LLaMA-2-7B-Chat 16,384 only 100 samples 6 training steps, showcasing extraordinary efficiency. Finally, also explore how data compositions curricula affect specific tasks, suggesting conversations good starting point. release code SFT at https://github.com/GAIR-NLP/Entropy-ABF.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

Extending LLMs' Context Window with 100 Samples

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....