NFDI4DS | UHH-SEMS - Publication Details

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

Python Modality (human–computer interaction) Benchmark (surveying) Code (set theory)

DOI: 10.48550/arxiv.2305.11176 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Siyuan Huang

Zhengkai Jiang

Hao Dong

Yu Qiao

Peng Gao

Hongsheng Li

ABSTRACT

Foundation models have made significant strides in various applications, including text-to-image generation, panoptic segmentation, and natural language processing. This paper presents Instruct2Act, a framework that utilizes Large Language Models to map multi-modal instructions sequential actions for robotic manipulation tasks. Specifically, Instruct2Act employs the LLM model generate Python programs constitute comprehensive perception, planning, action loop In perception section, pre-defined APIs are used access multiple foundation where Segment Anything Model (SAM) accurately locates candidate objects, CLIP classifies them. this way, leverages expertise of abilities convert complex high-level into precise policy codes. Our approach is adjustable flexible accommodating instruction modalities input types catering specific task demands. We validated practicality efficiency our by assessing it on tasks different scenarios within tabletop domains. Furthermore, zero-shot method outperformed many state-of-the-art learning-based policies several The code proposed available at https://github.com/OpenGVLab/Instruct2Act, serving as robust benchmark with assorted modality inputs.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....