NFDI4DS | UHH-SEMS - Publication Details

VideoPoet: A Large Language Model for Zero-Shot Video Generation

FOS: Computer and information sciences Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition

DOI: 10.48550/arxiv.2312.14125 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (31)

Kondratyuk, Dan

Yu, Lijun

Gu, Xiuye

Lezama, José

Huang, Jonathan

Schindler, Grant

Hornung, Rachel

Birodkar, Vighnesh

Yan, Jimmy

Chiu, Ming-Chang

Somandepalli, Kri...

Akbari, Hassan

Alon, Yair

Cheng, Yong

Dillon, Josh

Gupta, Agrim

Hahn, Meera

Hauth, Anja

Hendon, David

Martinez, Alonso

Minnen, David

Sirotenko, Mikhail

Sohn, Kihyuk

Yang, Xuan

Adam, Hartwig

Yang, Ming-Hsuan

Essa, Irfan

Wang, Huisheng

Ross, David A.

Seybold, Bryan

Jiang, Lu

ABSTRACT

To appear at ICML 2024; Project page: http://sites.research.google/videopoet/<br/>We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet's ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

VideoPoet: A Large Language Model for Zero-Shot Video Generation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....