Extending Llama-3's Context Ten-Fold Overnight

FOS: Computer and information sciences Computer Science - Computation and Language Computation and Language (cs.CL)
DOI: 10.48550/arxiv.2404.19553 Publication Date: 2024-04-30
ABSTRACT
We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. resulted model exhibits superior performances across a broad range evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves original capability over short contexts. dramatic extension mainly attributed merely 3.5K synthetic samples generated by GPT-4 , indicates LLMs' inherent (yet largely underestimated) potential its length. In fact, could be extended far beyond with more computation resources. Therefore, team will publicly release resources (including data, model, data generation pipeline, code) so facilitate future research community: \url{https://github.com/FlagOpen/FlagEmbedding}.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....