NFDI4DS | UHH-SEMS - Publication Details

Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities digital domains, yet their translation physical agents such as robots remains a significant challenge. This report introduces new family AI purposefully designed for robotics and built upon foundation Gemini 2.0. We present Robotics, an advanced Vision-Language-Action (VLA) model capable directly controlling robots. Robotics executes smooth reactive movements tackle wide range complex manipulation tasks while also being robust variations object types positions, handling unseen environments well following diverse, open vocabulary instructions. show that with additional fine-tuning, can be specialized including solving long-horizon, highly dexterous tasks, learning short-horizon from few 100 demonstrations adapting completely novel robot embodiments. is made possible because builds on top Robotics-ER model, second we introduce this work. (Embodied Reasoning) extends Gemini's reasoning into world, enhanced spatial temporal understanding. enables relevant detection, pointing, trajectory grasp prediction, multi-view correspondence 3D bounding box predictions. how combination support variety applications. discuss address important safety considerations related class models. The marks substantial step towards developing general-purpose realizes AI's potential world.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Gemini Robotics: Bringing AI into the Physical World

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....