PathAlign: A vision-language model for whole slide images in histopathology

Histopathology
DOI: 10.48550/arxiv.2406.19578 Publication Date: 2024-06-27
ABSTRACT
Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis such images, the gigapixel-scale size whole slide (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating across multiple slides, often making it difficult to create robust image-text pairs. As such, remain a largely untapped source supervision computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at patch-level. In this work, we develop model based BLIP-2 framework using WSIs paired curated text reports. This enables applications utilizing shared embedding space, as image retrieval finding cases interest, well integration WSI encoder frozen large language (LLM) WSI-based generative capabilities report generation AI-in-the-loop interactions. We utilize de-identified dataset over 350,000 pairs, spanning wide range diagnoses, procedure types, tissue types. present pathologist evaluation embeddings, results classification workflow prioritization (slide-level triaging). Model-generated was rated by pathologists accurate, without clinically significant error omission, 78% average. work demonstrates exciting potential language-aligned embeddings.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....