OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Margin (machine learning) Bridge (graph theory) Line (geometry)
DOI: 10.48550/arxiv.2309.00616 Publication Date: 2023-01-01
ABSTRACT
Current 3D open-vocabulary scene understanding methods mostly utilize well-aligned 2D images as the bridge to learn features with language. However, applying these approaches becomes challenging in scenarios where are absent. In this work, we introduce a new pipeline, namely, OpenIns3D, which requires no image inputs, for at instance level. The OpenIns3D framework employs "Mask-Snap-Lookup" scheme. "Mask" module learns class-agnostic mask proposals point clouds. "Snap" generates synthetic scene-level multiple scales and leverages vision language models extract interesting objects. "Lookup" searches through outcomes of help Mask2Pixel maps, contain precise correspondence between masks images, assign category names proposed masks. This input-free flexible approach achieves state-of-the-art results on wide range indoor outdoor datasets by large margin. Moreover, allows effortless switching detectors without re-training. When integrated powerful open-world such ODISE GroundingDINO, excellent were observed segmentation. LLM-powered like LISA, it demonstrates remarkable capacity process highly complex text queries require intricate reasoning world knowledge. Project page: https://zheninghuang.github.io/OpenIns3D/
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....