NFDI4DS | UHH-SEMS - Publication Details

PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

Affordance

DOI: 10.48550/arxiv.2410.11564 Publication Date: 2024-10-15

Abstract Supplemental Material References Cited by

AUTHORS (8)

Shang-Ching Liu

Van Nhiem Tran

Wenkai Chen

Wei-Lun Cheng

Yen‐Lin Huang

I-Bin Liao

Yung‐Hui Li

Jianwei Zhang

ABSTRACT

Affordance understanding, the task of identifying actionable regions on 3D objects, plays a vital role in allowing robotic systems to engage with and operate within physical world. Although Visual Language Models (VLMs) have excelled high-level reasoning long-horizon planning for manipulation, they still fall short grasping nuanced properties required effective human-robot interaction. In this paper, we introduce PAVLM (Point cloud Vision-Language Model), an innovative framework that utilizes extensive multimodal knowledge embedded pre-trained language models enhance affordance understanding point cloud. integrates geometric-guided propagation module hidden embeddings from large (LLMs) enrich visual semantics. On side, prompt Llama-3.1 generate refined context-aware text, augmenting instructional input deeper semantic cues. Experimental results 3D-AffordanceNet benchmark demonstrate outperforms baseline methods both full partial clouds, particularly excelling its generalization novel open-world tasks objects. For more information, visit our project site: pavlm-source.github.io.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....