Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition
DOI: 10.48550/arxiv.2405.08786 Publication Date: 2024-05-14
ABSTRACT
The Prostate Imaging Reporting and Data System (PI-RADS) is pivotal in the diagnosis of clinically significant prostate cancer through MRI imaging. Current deep learning-based PI-RADS scoring methods often lack incorporation essential clinical guidelines~(PICG) utilized by radiologists, potentially compromising accuracy. This paper introduces a novel approach that adapts multi-modal large language model (MLLM) to incorporate PICG into without additional annotations network parameters. We present two-stage fine-tuning process aimed at adapting MLLMs originally trained on natural images data domain while effectively integrating PICG. In first stage, we develop adapter layer specifically tailored for processing 3D image inputs design MLLM instructions differentiate modalities effectively. second translate guiding generate PICG-guided features. Through feature distillation, align features with feature, enabling information. our public dataset evaluate it real-world challenging in-house dataset. Experimental results demonstrate improves performance current networks.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()