AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology
DOI: 10.1609/aaai.v38i3.27963 Publication Date: 2024-03-25T09:19:00Z
ABSTRACT
Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images achieved remarkable performance in various visual tasks. Despite their strong abilities recognizing common objects due to extensive training datasets, they lack specific domain knowledge a weaker localized details within objects, which hinders effectiveness Industrial Anomaly Detection (IAD) task. On other hand, most existing IAD methods only provide anomaly scores necessitate manual setting thresholds distinguish between normal abnormal samples, restricts practical implementation. In this paper, we explore utilization LVLM address problem propose AnomalyGPT, novel approach based on LVLM. We generate data by simulating anomalous producing corresponding textual descriptions for each image. also employ an image decoder fine-grained semantic design prompt learner fine-tune using embeddings. Our AnomalyGPT eliminates need threshold adjustments, thus directly assesses presence locations anomalies. Additionally, supports multi-turn dialogues exhibits impressive few-shot in-context learning capabilities. With one shot, achieves state-of-the-art with accuracy 86.1%, image-level AUC 94.1%, pixel-level 95.3% MVTec-AD dataset.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (47)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....