Magic 1-For-1: Generating One Minute Video Clips within One Minute

CLIPS
DOI: 10.48550/arxiv.2502.07701 Publication Date: 2025-02-11
ABSTRACT
In this technical report, we present Magic 1-For-1 (Magic141), an efficient video generation model with optimized memory consumption and inference latency. The key idea is simple: factorize the text-to-video task into two separate easier tasks for diffusion step distillation, namely text-to-image image-to-video generation. We verify that same optimization algorithm, indeed to converge over task. also explore a bag of tricks reduce computational cost training (I2V) models from three aspects: 1) convergence speedup by using multi-modal prior condition injection; 2) latency speed up applying adversarial 3) parameter sparsification. With those techniques, are able generate 5-second clips within 3 seconds. By test time sliding window, minute-long one minute significantly improved visual quality motion dynamics, spending less than 1 second generating on average. conduct series preliminary explorations find out optimal tradeoff between during distillation hope could be good foundation open-source explorations. code weights available at https://github.com/DA-Group-PKU/Magic-1-For-1.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....