PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

Margin (machine learning)
DOI: 10.48550/arxiv.2305.10415 Publication Date: 2023-01-01
ABSTRACT
In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information. Firstly, reframe MedVQA as a generation task that naturally follows human-machine interaction, propose generative-based model for visual understanding by aligning information from pre-trained vision encoder large language model. Secondly, establish scalable pipeline to construct large-scale question-answering dataset, named PMC-VQA, contains 227k VQA pairs 149k cover various modalities or diseases. Thirdly, pre-train our proposed PMC-VQA and then fine-tune it multiple public benchmarks, e.g., VQA-RAD SLAKE, outperforming existing work margin. Additionally, test set has undergone manual verification, significantly more challenging, even best models struggle solve.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....