NFDI4DS | UHH-SEMS - Publication Details

DINOBot: Robot Manipulation via Retrieval and Alignment with Vision Foundation Models

Foundation (evidence)

DOI: 10.48550/arxiv.2402.13181 Publication Date: 2024-02-20

Abstract Supplemental Material References Cited by

AUTHORS (2)

Norman Di Palo

Edward Johns

ABSTRACT

We propose DINOBot, a novel imitation learning framework for robot manipulation, which leverages the image-level and pixel-level capabilities of features extracted from Vision Transformers trained with DINO. When interacting object, DINOBot first uses these to retrieve most visually similar object experienced during human demonstrations, then this align its end-effector enable effective interaction. Through series real-world experiments on everyday tasks, we show that exploiting both properties vision foundation models enables unprecedented efficiency generalisation. Videos code are available at https://www.robot-learning.uk/dinobot.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

DINOBot: Robot Manipulation via Retrieval and Alignment with Vision Foundation Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....