NFDI4DS | UHH-SEMS - Publication Details

ClipRover: Zero-shot Vision-Language Exploration and Target Discovery by Mobile Robots

Zero (linguistics)

DOI: 10.48550/arxiv.2502.08791 Publication Date: 2025-02-12

Abstract Supplemental Material References Cited by

AUTHORS (4)

Yuxuan Zhang

Adnan Abdullah

Sanjeev J. Koppal

Md Jahidul Islam

ABSTRACT

Vision-language navigation (VLN) has emerged as a promising paradigm, enabling mobile robots to perform zero-shot inference and execute tasks without specific pre-programming. However, current systems often separate map exploration path planning, with relying on inefficient algorithms due limited (partially observed) environmental information. In this paper, we present novel pipeline named ''ClipRover'' for simultaneous target discovery in unknown environments, leveraging the capabilities of vision-language model CLIP. Our approach requires only monocular vision operates any prior or knowledge about target. For comprehensive evaluations, design functional prototype UGV (unmanned ground vehicle) system ''Rover Master'', customized platform general-purpose VLN tasks. We integrate deploy ClipRover Rover Master evaluate its throughput, obstacle avoidance capability, trajectory performance across various real-world scenarios. Experimental results demonstrate that consistently outperforms traditional traversal achieves comparable path-planning methods depend knowledge. Notably, offers real-time active requiring pre-captured candidate images pre-built node graphs, addressing key limitations existing pipelines.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

ClipRover: Zero-shot Vision-Language Exploration and Target Discovery by Mobile Robots

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....