NFDI4DS | UHH-SEMS - Publication Details

HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks

FOS: Computer and information sciences Computer Science - Robotics Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Robotics (cs.RO)

DOI: 10.48550/arxiv.2308.12537 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Dong, Zichao

Zhang, Weikun

Huang, Xufeng

Ji, Hang

Zhan, Xin

Chen, Junbo

ABSTRACT

Human robot interaction is an exciting task, which aimed to guide robots following instructions from human. Since huge gap lies between human natural language and machine codes, end to end human robot interaction models is fair challenging. Further, visual information receiving from sensors of robot is also a hard language for robot to perceive. In this work, HuBo-VLM is proposed to tackle perception tasks associated with human robot interaction including object detection and visual grounding by a unified transformer based vision language model. Extensive experiments on the Talk2Car benchmark demonstrate the effectiveness of our approach. Code would be publicly available in https://github.com/dzcgaara/HuBo-VLM.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....