NFDI4DS | UHH-SEMS - Publication Details

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

OPENALEX - Publications

Runyu Ding Jihan Yang Chuhui Xue Wenqing Zhang Song Bai and 1 more

Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich vocabulary concepts. However, this success cannot be directly transferred 3D scenarios due inaccessibility large-scale 3D-text pairs. To end, we propose distill knowledge encoded in pretrained vision-language (VL) foundation models through captioning...

10.1109/cvpr52729.2023.00677 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

OPENALEX - Publications

Jihan Yang Runyu Ding Weipeng Deng Zhe Wang Xiaojuan Qi

10.1109/cvpr52733.2024.01874 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding

OPENALEX - Publications

Runyu Ding Jihan Yang Chuhui Xue Wenqing Zhang Song Bai and 1 more

Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset. This task is challenging because model needs both localize novel 3D objects infer their semantic categories. A key factor for recent progress 2D open-world perception availability of large-scale image-text pairs from Internet, which cover a wide range vocabulary concepts. However, this success hard replicate scenarios due scarcity 3D-text pairs....

10.1109/tpami.2024.3410324 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-06-06

Can 3D Vision-Language Models Truly Understand Natural Language?

OPENALEX - Publications

Weipeng Deng Runyu Ding Jihan Yang Jiahui Liu Yijiang Li and 2 more

Rapid advancements in 3D vision-language (3D-VL) tasks have opened up new avenues for human interaction with embodied agents or robots using natural language. Despite this progress, we find a notable limitation: existing 3D-VL models exhibit sensitivity to the styles of language input, struggling understand sentences same semantic meaning but written different variants. This observation raises critical question: Can truly language? To test understandability models, first propose robustness...

10.48550/arxiv.2403.14760 preprint EN arXiv (Cornell University) 2024-03-21

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

OPENALEX - Publications

Runyu Ding Jihan Yang Chuhui Xue Wenqing Zhang Song Bai and 1 more

Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich vocabulary concepts. However, this success cannot be directly transferred 3D scenarios due inaccessibility large-scale 3D-text pairs. To end, we propose distill knowledge encoded in pre-trained vision-language (VL) foundation models through captioning...

10.48550/arxiv.2211.16312 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding

OPENALEX - Publications

Runyu Ding Jihan Yang Chuhui Xue Wenqing Zhang Song Bai and 1 more

Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset. This task is challenging because model needs both localize novel 3D objects infer their semantic categories. A key factor for recent progress 2D open-world perception availability of large-scale image-text pairs from Internet, which cover a wide range vocabulary concepts. However, this success hard replicate scenarios due scarcity 3D-text pairs....

10.48550/arxiv.2308.00353 preprint EN other-oa arXiv (Cornell University) 2023-01-01