NFDI4DS | UHH-SEMS - Publication Details

Speech driven video editing via an audio-conditioned diffusion model

Video editing TRACE (psycholinguistics) Code (set theory)

DOI: 10.1016/j.imavis.2024.104911 Publication Date: 2024-01-21T05:40:07Z

Abstract Supplemental Material References Cited by

AUTHORS (7)

Dan Bigioi

Shubhajit Basak

Michał Stypułkowski

Maciej Zieba

Hugh Jordan

Rachel McDonnell

Peter Corcoran

ABSTRACT

Taking inspiration from recent developments in visual generative tasks using diffusion models, we propose a method for end-to-end speech-driven video editing denoising model. Given of talking person, and separate auditory speech recording, the lip jaw motions are re-synchronised without relying on intermediate structural representations such as facial landmarks or 3D face We show this is possible by conditioning model audio mel spectral features to generate synchronised motion. Proof concept results demonstrated both single-speaker multi-speaker editing, providing baseline CREMA-D audiovisual data set. To best our knowledge, first work demonstrate validate feasibility applying models task audio-driven editing. All code, datasets, used part made publicly available here: https://danbigioi.github.io/DiffusionVideoEditing/.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (91)

CITATIONS (19)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications OPENALEX - Publications

PlumX Metrics

Speech driven video editing via an audio-conditioned diffusion model

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....