NFDI4DS | UHH-SEMS - Publication Details

Large Language Diffusion Models

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Computation and Language (cs.CL) Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2502.09992 Publication Date: 2025-02-14

Abstract Supplemental Material References Cited by

AUTHORS (10)

Shen Nie

Fengqi Zhu

Zebin You

Xiaolu Zhang

Jingyang Ou

Jun Hu

Jun Zhou

Yankai Lin

Ji-Rong Wen

Chongxuan Li

ABSTRACT

Autoregressive models (ARMs) are widely regarded as the cornerstone of large language (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under pre-training and supervised fine-tuning (SFT) paradigm. LLaDA distributions through forward data masking process reverse process, parameterized vanilla Transformer to predict masked tokens. By optimizing likelihood bound, it provides principled generative approach for probabilistic inference. Across extensive benchmarks, demonstrates strong scalability, outperforming our self-constructed ARM baselines. Remarkably, 8B is competitive with LLMs like LLaMA3 in in-context learning and, after SFT, exhibits impressive instruction-following abilities case studies such multi-turn dialogue. Moreover, addresses reversal curse, surpassing GPT-4o poem completion task. Our findings establish viable promising alternative ARMs, challenging assumption that key LLM capabilities discussed above inherently tied ARMs.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Large Language Diffusion Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....