NFDI4DS | UHH-SEMS - Publication Details

Extending Llama-3's Context Ten-Fold Overnight

FOS: Computer and information sciences Computer Science - Computation and Language Computation and Language (cs.CL)

DOI: 10.48550/arxiv.2404.19553 Publication Date: 2024-04-30

Abstract Supplemental Material References Cited by

AUTHORS (7)

Peitian Zhang

Ninglu Shao

Zheng Liu

Shitao Xiao

Hongjin Qian

Qiwei Ye

Zhicheng Dou

ABSTRACT

We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. resulted model exhibits superior performances across a broad range evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves original capability over short contexts. dramatic extension mainly attributed merely 3.5K synthetic samples generated by GPT-4 , indicates LLMs' inherent (yet largely underestimated) potential its length. In fact, could be extended far beyond with more computation resources. Therefore, team will publicly release resources (including data, model, data generation pipeline, code) so facilitate future research community: \url{https://github.com/FlagOpen/FlagEmbedding}.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Extending Llama-3's Context Ten-Fold Overnight

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....