NFDI4DS | UHH-SEMS - Publication Details

Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark

Benchmark (surveying) Modality (human–computer interaction)

DOI: 10.48550/arxiv.2306.06494 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Li Xu

Bo Liu

Ameer Khan

Lu Fan

Xiao-Ming Wu

ABSTRACT

With the availability of large-scale, comprehensive, and general-purpose vision-language (VL) datasets such as MSCOCO, pre-training (VLP) has become an active area research proven to be effective for various VL tasks visual-question answering. However, studies on VLP in medical domain have so far been scanty. To provide a comprehensive perspective tasks, we conduct thorough experimental analysis study key factors that may affect performance with unified Transformer. allow making sound quick decisions, propose RadioGraphy Captions (RGC), high-quality, multi-modality radiographic dataset containing 18,434 image-caption pairs collected from open-access online database MedPix. RGC can used or new benchmark report generation image-text retrieval. By utilizing other available pre-training, develop several insights guide future strong baselines tasks.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....