Multi-sentence Video Grounding for Long Video Generation
FOS: Computer and information sciences
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
DOI:
10.48550/arxiv.2407.13219
Publication Date:
2024-07-18
AUTHORS (5)
ABSTRACT
Video generation has witnessed great success recently, but their application in generating long videos still remains challenging due to the difficulty maintaining temporal consistency of generated and high memory cost during generation. To tackle problems, this paper, we propose a brave new idea Multi-sentence Grounding for Long Generation, connecting massive video moment retrieval task first time, providing paradigm The method our work can be summarized as three steps: (i) We design sequential scene text prompts queries grounding, utilizing search segments that meet requirements database. (ii) Based on source frames retrieved segments, adopt editing methods create content while preserving video. Since conducted segment by segment, even frame frame, it largely reduces cost. (iii) also attempt morphing personalized improve subject generation, ablation experimental results subtasks Our approach seamlessly extends development image/video editing, grounding offering effective solutions at low
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....