SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Object-orientation
Spatial intelligence
DOI:
10.48550/arxiv.2502.13143
Publication Date:
2025-02-18
AUTHORS (18)
ABSTRACT
Spatial intelligence is a critical component of embodied AI, promoting robots to understand and interact with their environments. While recent advances have enhanced the ability VLMs perceive object locations positional relationships, they still lack capability precisely orientations-a key requirement for tasks involving fine-grained manipulations. Addressing this limitation not only requires geometric reasoning but also an expressive intuitive way represent orientation. In context, we propose that natural language offers more flexible representation space than canonical frames, making it particularly suitable instruction-following robotic systems. paper, introduce concept semantic orientation, which defines orientations using in reference-frame-free manner (e.g., ''plug-in'' direction USB or ''handle'' knife). To support this, construct OrienText300K, large-scale dataset 3D models annotated link understanding functional semantics. By integrating orientation into VLM system, enable generate manipulation actions both orientational constraints. Extensive experiments simulation real world demonstrate our approach significantly enhances capabilities, e.g., 48.7% accuracy on Open6DOR 74.9% SIMPLER.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....