NFDI4DS | UHH-SEMS - Publication Details

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Single stage

DOI: 10.48550/arxiv.2502.13128 Publication Date: 2025-02-18

Abstract Supplemental Material References Cited by

AUTHORS (9)

Zihan Liu

Shuangrui Ding

Zhixiong Zhang

Xiaoyi Dong

Pan Zhang

Yuhang Zang

Yuhang Cao

Dongdong Lin

Jiaqi Wang

ABSTRACT

Text-to-song generation, the task of creating vocals and accompaniment from textual inputs, poses significant challenges due to domain complexity data scarcity. Existing approaches often employ multi-stage generation procedures, resulting in cumbersome training inference pipelines. In this paper, we propose SongGen, a fully open-source, single-stage auto-regressive transformer designed for controllable song generation. The proposed model facilitates fine-grained control over diverse musical attributes, including lyrics descriptions instrumentation, genre, mood, timbre, while also offering an optional three-second reference clip voice cloning. Within unified framework, SongGen supports two output modes: mixed mode, which generates mixture directly, dual-track synthesizes them separately greater flexibility downstream applications. We explore token pattern strategies each leading notable improvements valuable insights. Furthermore, design automated preprocessing pipeline with effective quality control. To foster community engagement future research, will release our weights, code, annotated data, pipeline. generated samples are showcased on project page at https://liuzh-19.github.io/SongGen/ , code be available https://github.com/LiuZH-19/SongGen .

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....