MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
FOS: Computer and information sciences
Computer Science - Machine Learning
Computer Science - Computation and Language
Computation and Language (cs.CL)
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.2404.06395
Publication Date:
2024-04-09
AUTHORS (25)
ABSTRACT
The burgeoning interest in developing Large Language Models (LLMs) with up to trillion parameters has been met concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores importance exploring potential Small (SLMs) as a resource-efficient alternative. In this context, we introduce MiniCPM, specifically 1.2B 2.4B non-embedding parameter variants, not only excel their respective categories but also demonstrate capabilities on par 7B-13B LLMs. While focusing SLMs, our approach exhibits scalability both model data dimensions for future LLM research. Regarding scaling, employ extensive wind tunnel experiments stable optimal scaling. For Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive continuous training domain adaptation. We present an in-depth analysis intriguing dynamics that occurred WSD LRS. With LRS, are now able efficiently study data-model scaling law without retraining axes data, from which derive much higher compute ratio than Chinchilla Optimal. Additionally, MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE MiniCPM-128K, whose excellent performance further cementing MiniCPM's foundation diverse SLM applications. models available publicly at https://github.com/OpenBMB/MiniCPM .
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....