Generative Text-Guided 3D Vision-Language Pretraining for Unified Medical Image Segmentation

Generative model Modalities Representation Feature Learning
DOI: 10.48550/arxiv.2306.04811 Publication Date: 2023-01-01
ABSTRACT
Vision-Language Pretraining (VLP) has demonstrated remarkable capabilities in learning visual representations from textual descriptions of images without annotations. Yet, effective VLP demands large-scale image-text pairs, a resource that suffers scarcity the medical domain. Moreover, conventional is limited to 2D while encompass diverse modalities, often 3D, making process more challenging. To address these challenges, we present Generative Text-Guided 3D for Unified Medical Image Segmentation (GTGM), framework extends relying on paired descriptions. Specifically, GTGM utilizes large language models (LLM) generate medical-style text images. This synthetic then used supervise representation learning. Furthermore, negative-free contrastive objective strategy introduced cultivate consistent between augmented image patches, which effectively mitigates biases associated with strict positive-negative sample pairings. We evaluate three imaging modalities - Computed Tomography (CT), Magnetic Resonance Imaging (MRI), and electron microscopy (EM) over 13 datasets. GTGM's superior performance across various segmentation tasks underscores its effectiveness versatility, by enabling extension into imagery bypassing need text.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....