NFDI4DS | UHH-SEMS - Publication Details

Towards General Purpose Vision Systems

Generality Closed captioning Bounding overwatch

DOI: 10.48550/arxiv.2104.00743 Publication Date: 2021-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Tanmay Gupta

Amita Kamath

Aniruddha Kembhavi

Derek Hoiem

ABSTRACT

Computer vision systems today are primarily N-purpose systems, designed and trained for a predefined set of tasks. Adapting such to new tasks is challenging often requires non-trivial modifications the network architecture (e.g. adding output heads) or training process losses). To reduce time expertise required develop applications, we would like create general purpose that can learn perform range without any modification learning process. In this paper, propose GPV-1, task-agnostic vision-language involve receiving an image producing text and/or bounding boxes, including classification, localization, visual question answering, captioning, more. We also evaluations generality architecture, skill-concept transfer, efficiency may inform future work on vision. Our experiments indicate GPV-1 effective at multiple tasks, reuses some concept knowledge across Referring Expressions task zero-shot, further improves upon zero-shot performance using few samples.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Towards General Purpose Vision Systems

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....