NFDI4DS | UHH-SEMS - Publication Details

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Computer Science - Human-Computer Interaction Computer Science - Multimedia Human-Computer Interaction (cs.HC) Multimedia (cs.MM)

DOI: 10.48550/arxiv.2404.01862 Publication Date: 2024-04-02

Abstract Supplemental Material References Cited by

AUTHORS (10)

He Xu

Qiaochu Huang

Zhensong Zhang

Zhiwei Lin

Zhiyong Wu

Sicheng Yang

Minglei Li

Zhiyi Chen

Songcen Xu

Xiaofei Wu

ABSTRACT

Co-speech gestures, if presented in the lively form of videos, can achieve superior visual effects human-machine interaction. While previous works mostly generate structural human skeletons, resulting omission appearance information, we focus on direct generation audio-driven co-speech gesture videos this work. There are two main challenges: 1) A suitable motion feature is needed to describe complex movements with crucial information. 2) Gestures and speech exhibit inherent dependencies should be temporally aligned even arbitrary length. To solve these problems, present a novel motion-decoupled framework videos. Specifically, first introduce well-designed nonlinear TPS transformation obtain latent features preserving essential Then transformer-based diffusion model proposed learn temporal correlation between gestures speech, performs space, followed by an optimal selection module produce long-term coherent consistent For better perception, further design refinement network focusing missing details certain areas. Extensive experimental results show that our significantly outperforms existing approaches both video-related evaluations. Our code, demos, more resources available at https://github.com/thuhcsi/S2G-MDDiffusion.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....