NFDI4DS | UHH-SEMS - Publication Details

VIMA: General Robot Manipulation with Multimodal Prompts

FOS: Computer and information sciences Computer Science - Robotics Computer Science - Machine Learning 03 medical and health sciences 0302 clinical medicine Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Robotics (cs.RO) Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2210.03094 Publication Date: 2022-01-01

Abstract Supplemental Material References Cited by

AUTHORS (10)

Jiang, Yunfan

Gupta, Agrim

Zhang, Zichen

Wang, Guanzhi

Dou, Yongqiang

Chen, Yanjun

Fei-Fei, Li

Anandkumar, Anima

Zhu, Yuke

Fan, Linxi

ABSTRACT

Prompt-based learning has emerged as a successful paradigm in natural language processing, where a single general-purpose language model can be instructed to perform any task specified by input prompts. Yet task specification in robotics comes in various forms, such as imitating one-shot demonstrations, following language instructions, and reaching visual goals. They are often considered different tasks and tackled by specialized models. We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens. Accordingly, we develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks with multimodal prompts, 600K+ expert trajectories for imitation learning, and a four-level evaluation protocol for systematic generalization. We design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively. VIMA features a recipe that achieves strong model scalability and data efficiency. It outperforms alternative designs in the hardest zero-shot generalization setting by up to $2.9\times$ task success rate given the same training data. With $10\times$ less training data, VIMA still performs $2.7\times$ better than the best competing variant. Code and video demos are available at https://vimalabs.github.io/<br/>ICML 2023 Camera-ready version. Project website: https://vimalabs.github.io/<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

VIMA: General Robot Manipulation with Multimodal Prompts

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....