PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

Affordance
DOI: 10.48550/arxiv.2410.11564 Publication Date: 2024-10-15
ABSTRACT
Affordance understanding, the task of identifying actionable regions on 3D objects, plays a vital role in allowing robotic systems to engage with and operate within physical world. Although Visual Language Models (VLMs) have excelled high-level reasoning long-horizon planning for manipulation, they still fall short grasping nuanced properties required effective human-robot interaction. In this paper, we introduce PAVLM (Point cloud Vision-Language Model), an innovative framework that utilizes extensive multimodal knowledge embedded pre-trained language models enhance affordance understanding point cloud. integrates geometric-guided propagation module hidden embeddings from large (LLMs) enrich visual semantics. On side, prompt Llama-3.1 generate refined context-aware text, augmenting instructional input deeper semantic cues. Experimental results 3D-AffordanceNet benchmark demonstrate outperforms baseline methods both full partial clouds, particularly excelling its generalization novel open-world tasks objects. For more information, visit our project site: pavlm-source.github.io.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....