AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
FOS: Computer and information sciences
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
0202 electrical engineering, electronic engineering, information engineering
02 engineering and technology
DOI:
10.1609/aaai.v38i3.27963
Publication Date:
2024-03-25T09:19:00Z
AUTHORS (6)
ABSTRACT
Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images achieved remarkable performance in various visual tasks. Despite their strong abilities recognizing common objects due to extensive training datasets, they lack specific domain knowledge a weaker localized details within objects, which hinders effectiveness Industrial Anomaly Detection (IAD) task. On other hand, most existing IAD methods only provide anomaly scores necessitate manual setting thresholds distinguish between normal abnormal samples, restricts practical implementation. In this paper, we explore utilization LVLM address problem propose AnomalyGPT, novel approach based on LVLM. We generate data by simulating anomalous producing corresponding textual descriptions for each image. also employ an image decoder fine-grained semantic design prompt learner fine-tune using embeddings. Our AnomalyGPT eliminates need threshold adjustments, thus directly assesses presence locations anomalies. Additionally, supports multi-turn dialogues exhibits impressive few-shot in-context learning capabilities. With one shot, achieves state-of-the-art with accuracy 86.1%, image-level AUC 94.1%, pixel-level 95.3% MVTec-AD dataset.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (47)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....