Gemini Robotics: Bringing AI into the Physical World
FOS: Computer and information sciences
Computer Science - Robotics
Robotics (cs.RO)
DOI:
10.48550/arxiv.2503.20020
Publication Date:
2025-03-25
AUTHORS (118)
ABSTRACT
Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities digital domains, yet their translation physical agents such as robots remains a significant challenge. This report introduces new family AI purposefully designed for robotics and built upon foundation Gemini 2.0. We present Robotics, an advanced Vision-Language-Action (VLA) model capable directly controlling robots. Robotics executes smooth reactive movements tackle wide range complex manipulation tasks while also being robust variations object types positions, handling unseen environments well following diverse, open vocabulary instructions. show that with additional fine-tuning, can be specialized including solving long-horizon, highly dexterous tasks, learning short-horizon from few 100 demonstrations adapting completely novel robot embodiments. is made possible because builds on top Robotics-ER model, second we introduce this work. (Embodied Reasoning) extends Gemini's reasoning into world, enhanced spatial temporal understanding. enables relevant detection, pointing, trajectory grasp prediction, multi-view correspondence 3D bounding box predictions. how combination support variety applications. discuss address important safety considerations related class models. The marks substantial step towards developing general-purpose realizes AI's potential world.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....