Runyu Ding

ORCID: 0009-0009-1582-5092
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Advanced Neural Network Applications
  • 3D Surveying and Cultural Heritage
  • Advanced Image and Video Retrieval Techniques
  • Robotics and Sensor-Based Localization
  • 3D Shape Modeling and Analysis
  • Robotics and Automated Systems
  • Natural Language Processing Techniques

University of Hong Kong
2023-2024

Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich vocabulary concepts. However, this success cannot be directly transferred 3D scenarios due inaccessibility large-scale 3D-text pairs. To end, we propose distill knowledge encoded in pretrained vision-language (VL) foundation models through captioning...

10.1109/cvpr52729.2023.00677 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

10.1109/cvpr52733.2024.01874 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset. This task is challenging because model needs both localize novel 3D objects infer their semantic categories. A key factor for recent progress 2D open-world perception availability of large-scale image-text pairs from Internet, which cover a wide range vocabulary concepts. However, this success hard replicate scenarios due scarcity 3D-text pairs....

10.1109/tpami.2024.3410324 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-06-06

Rapid advancements in 3D vision-language (3D-VL) tasks have opened up new avenues for human interaction with embodied agents or robots using natural language. Despite this progress, we find a notable limitation: existing 3D-VL models exhibit sensitivity to the styles of language input, struggling understand sentences same semantic meaning but written different variants. This observation raises critical question: Can truly language? To test understandability models, first propose robustness...

10.48550/arxiv.2403.14760 preprint EN arXiv (Cornell University) 2024-03-21

Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich vocabulary concepts. However, this success cannot be directly transferred 3D scenarios due inaccessibility large-scale 3D-text pairs. To end, we propose distill knowledge encoded in pre-trained vision-language (VL) foundation models through captioning...

10.48550/arxiv.2211.16312 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset. This task is challenging because model needs both localize novel 3D objects infer their semantic categories. A key factor for recent progress 2D open-world perception availability of large-scale image-text pairs from Internet, which cover a wide range vocabulary concepts. However, this success hard replicate scenarios due scarcity 3D-text pairs....

10.48550/arxiv.2308.00353 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...