NFDI4DS | UHH-SEMS - Publication Details

Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving

Similarity (geometry)

DOI: 10.48550/arxiv.2307.09329 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Kaavya Rekanar

Ciarán Eising

Ganesh Sistu

M. Hayes

ABSTRACT

This short paper presents a preliminary analysis of three popular Visual Question Answering (VQA) models, namely ViLBERT, ViLT, and LXMERT, in the context answering questions relating to driving scenarios. The performance these models is evaluated by comparing similarity responses reference answers provided computer vision experts. Model selection predicated on transformer utilization multimodal architectures. results indicate that incorporating cross-modal attention late fusion techniques exhibit promising potential for generating improved within perspective. initial serves as launchpad forthcoming comprehensive comparative study involving nine VQA sets scene further investigations into effectiveness model queries self-driving Supplementary material available at https://github.com/KaavyaRekanar/Towards-a-performance-analysis-on-pre-trained-VQA-models-for-autonomous-driving.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....