Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving

Similarity (geometry)
DOI: 10.48550/arxiv.2307.09329 Publication Date: 2023-01-01
ABSTRACT
This short paper presents a preliminary analysis of three popular Visual Question Answering (VQA) models, namely ViLBERT, ViLT, and LXMERT, in the context answering questions relating to driving scenarios. The performance these models is evaluated by comparing similarity responses reference answers provided computer vision experts. Model selection predicated on transformer utilization multimodal architectures. results indicate that incorporating cross-modal attention late fusion techniques exhibit promising potential for generating improved within perspective. initial serves as launchpad forthcoming comprehensive comparative study involving nine VQA sets scene further investigations into effectiveness model queries self-driving Supplementary material available at https://github.com/KaavyaRekanar/Towards-a-performance-analysis-on-pre-trained-VQA-models-for-autonomous-driving.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....