Towards Generalist Biomedical AI

Benchmark (surveying)
DOI: 10.48550/arxiv.2307.14334 Publication Date: 2023-01-01
ABSTRACT
Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, interpret this at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To the development of these models, we first curate MultiMedBench, a new multimodal benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering, mammography dermatology image interpretation, radiology report generation summarization, genomic variant calling. We then introduce Med-PaLM Multimodal (Med-PaLM M), our proof concept for generalist AI system. M large generative model encodes interprets including clinical language, genomics same set weights. reaches performance competitive or exceeding state art on all tasks, often surpassing specialist models by wide margin. also examples zero-shot generalization novel concepts positive transfer learning across emergent reasoning. further probe capabilities limitations M, conduct radiologist evaluation model-generated (and human) chest X-ray reports observe encouraging scales. In side-by-side ranking 246 retrospective X-rays, clinicians express pairwise preference over those produced radiologists in up 40.50% cases, suggesting potential utility. While considerable work needed validate real-world use results represent milestone towards systems.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....