DINOBot: Robot Manipulation via Retrieval and Alignment with Vision Foundation Models
Foundation (evidence)
DOI:
10.48550/arxiv.2402.13181
Publication Date:
2024-02-20
AUTHORS (2)
ABSTRACT
We propose DINOBot, a novel imitation learning framework for robot manipulation, which leverages the image-level and pixel-level capabilities of features extracted from Vision Transformers trained with DINO. When interacting object, DINOBot first uses these to retrieve most visually similar object experienced during human demonstrations, then this align its end-effector enable effective interaction. Through series real-world experiments on everyday tasks, we show that exploiting both properties vision foundation models enables unprecedented efficiency generalisation. Videos code are available at https://www.robot-learning.uk/dinobot.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....