LEO: Generative Latent Image Animator for Human Video Synthesis

Optical Flow Sequence (biology) Motion Capture Generative model
DOI: 10.48550/arxiv.2305.03989 Publication Date: 2023-01-01
ABSTRACT
Spatio-temporal coherency is a major challenge in synthesizing high quality videos, particularly human videos that contain rich global and local deformations. To resolve this challenge, previous approaches have resorted to different features the generation process aimed at representing appearance motion. However, absence of strict mechanisms guarantee such disentanglement, separation motion from has remained challenging, resulting spatial distortions temporal jittering break spatio-temporal coherency. Motivated by this, we here propose LEO, novel framework for video synthesis, placing emphasis on Our key idea represent as sequence flow maps process, which inherently isolate appearance. We implement via flow-based image animator Latent Motion Diffusion Model (LMDM). The former bridges space codes with maps, synthesizes frames warp-and-inpaint manner. LMDM learns capture prior training data sequences codes. Extensive quantitative qualitative analysis suggests LEO significantly improves coherent synthesis over methods datasets TaichiHD, FaceForensics CelebV-HQ. In addition, effective disentanglement allows two additional tasks, namely infinite-length well content-preserving editing.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....