NFDI4DS | UHH-SEMS - Publication Details

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Computation and Language (cs.CL) Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2404.06395 Publication Date: 2024-04-09

Abstract Supplemental Material References Cited by

AUTHORS (25)

Shengding Hu

Yuge Tu

Xu Han

Chaoqun He

Ganqu Cui

Long Xiang

Zhi Zheng

Yewei Fang

Yuxiang Huang

Weilin Zhao

Xinrong Zhang

Zheng Leng Thai

Kaihuo Zhang

Chongyi Wang

Yuan Yao

Chenyang Zhao

Jie Zhou

Jie Cai

Zhongwu Zhai

Ning Ding

Chao Jia

Guoyang Zeng

Dahai Li

Zhiyuan Liu

Maosong Sun

ABSTRACT

The burgeoning interest in developing Large Language Models (LLMs) with up to trillion parameters has been met concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores importance exploring potential Small (SLMs) as a resource-efficient alternative. In this context, we introduce MiniCPM, specifically 1.2B 2.4B non-embedding parameter variants, not only excel their respective categories but also demonstrate capabilities on par 7B-13B LLMs. While focusing SLMs, our approach exhibits scalability both model data dimensions for future LLM research. Regarding scaling, employ extensive wind tunnel experiments stable optimal scaling. For Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive continuous training domain adaptation. We present an in-depth analysis intriguing dynamics that occurred WSD LRS. With LRS, are now able efficiently study data-model scaling law without retraining axes data, from which derive much higher compute ratio than Chinchilla Optimal. Additionally, MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE MiniCPM-128K, whose excellent performance further cementing MiniCPM's foundation diverse SLM applications. models available publicly at https://github.com/OpenBMB/MiniCPM .

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....