NFDI4DS | UHH-SEMS - Publication Details

Towards Generalist Biomedical AI

Benchmark (surveying)

DOI: 10.48550/arxiv.2307.14334 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (32)

Tao Tu

Shekoofeh Azizi

Danny Driess

Mike Schaekermann

Mohamed Amin

Pi-Chuan Chang

Andrew Carroll

Chuck Lau

Ryutaro Tanno

Sofia Ira Ktena

Basil Mustafa

Aakanksha Chowdhery

Yun Liu

Simon Kornblith

David J. Fleet

P. Mansfield

Sushant Prakash

Renee Wong

Sunny Virmani

Christopher Semturs

S. Sara Mahdavi

Bradley Green

Ewa Dominowska

Blaise Agüera y A...

Joëlle Barral

Dale R. Webster

Greg S. Corrado

Yossi Matias

Karan Singhal

Pete Florence

Alan Karthikesali...

Vivek Natarajan

ABSTRACT

Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, interpret this at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To the development of these models, we first curate MultiMedBench, a new multimodal benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering, mammography dermatology image interpretation, radiology report generation summarization, genomic variant calling. We then introduce Med-PaLM Multimodal (Med-PaLM M), our proof concept for generalist AI system. M large generative model encodes interprets including clinical language, genomics same set weights. reaches performance competitive or exceeding state art on all tasks, often surpassing specialist models by wide margin. also examples zero-shot generalization novel concepts positive transfer learning across emergent reasoning. further probe capabilities limitations M, conduct radiologist evaluation model-generated (and human) chest X-ray reports observe encouraging scales. In side-by-side ranking 246 retrospective X-rays, clinicians express pairwise preference over those produced radiologists in up 40.50% cases, suggesting potential utility. While considerable work needed validate real-world use results represent milestone towards systems.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Towards Generalist Biomedical AI

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....