Towards General Purpose Vision Systems
Generality
Closed captioning
Bounding overwatch
DOI:
10.48550/arxiv.2104.00743
Publication Date:
2021-01-01
AUTHORS (4)
ABSTRACT
Computer vision systems today are primarily N-purpose systems, designed and trained for a predefined set of tasks. Adapting such to new tasks is challenging often requires non-trivial modifications the network architecture (e.g. adding output heads) or training process losses). To reduce time expertise required develop applications, we would like create general purpose that can learn perform range without any modification learning process. In this paper, propose GPV-1, task-agnostic vision-language involve receiving an image producing text and/or bounding boxes, including classification, localization, visual question answering, captioning, more. We also evaluations generality architecture, skill-concept transfer, efficiency may inform future work on vision. Our experiments indicate GPV-1 effective at multiple tasks, reuses some concept knowledge across Referring Expressions task zero-shot, further improves upon zero-shot performance using few samples.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....