Probing the 3D Awareness of Visual Foundation Models
Foundation (evidence)
ENCODE
Code (set theory)
Ask price
DOI:
10.48550/arxiv.2404.08636
Publication Date:
2024-04-12
AUTHORS (10)
ABSTRACT
Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent generalize to arbitrary images for their training task, intermediate representations are useful other tasks such as detection and segmentation. Given that classify, delineate, localize objects 2D, we ask whether they also represent 3D structure? In this work, analyze the awareness of models. We posit implies (1) encode structure scene (2) consistently surface across views. conduct a series experiments using task-specific probes zero-shot inference procedures on frozen features. Our reveal several limitations current code analysis be found at https://github.com/mbanani/probe3d.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....