NFDI4DS | UHH-SEMS - Publication Details

Evaluating AI systems under uncertain ground truth: a case study in dermatology

Ground truth Normalization Uncertainty Quantification

DOI: 10.48550/arxiv.2307.02191 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (20)

David Stutz

Ali Taylan Cemgil

Abhijit Guha Roy

Tatiana Matejovicova

Melih Barsbey

Patricia H. Strachan

Mike Schaekermann

Jan Freyberg

Rajeev Rikhye

Beverly Freeman

Javier Perez Matos

Umesh Telang

Dale R. Webster

Yuan Liu

Greg S. Corrado

Yossi Matias

Pushmeet Kohli

Yun Liu

Arnaud Doucet

Alan Karthikesali...

ABSTRACT

For safety, AI systems in health undergo thorough evaluations before deployment, validating their predictions against a ground truth that is assumed certain. However, this actually not the case and may be uncertain. Unfortunately, largely ignored standard evaluation of models but can have severe consequences such as overestimating future performance. To avoid this, we measure effects uncertainty, which assume decomposes into two main components: annotation uncertainty stems from lack reliable annotations, inherent due to limited observational information. This when estimating by deterministically aggregating e.g., majority voting or averaging. In contrast, propose framework where aggregation done using statistical model. Specifically, frame annotations posterior inference so-called plausibilities, representing distributions over classes classification setting, subject hyper-parameter encoding annotator reliability. Based on model, metric for measuring provide uncertainty-adjusted metrics performance evaluation. We present study applying our skin condition images are provided form differential diagnoses. The deterministic adjudication process called inverse rank normalization (IRN) previous work ignores Instead, alternative models: probabilistic version IRN Plackett-Luce-based find large portion dataset exhibits significant IRN-based severely over-estimates without providing estimates.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Evaluating AI systems under uncertain ground truth: a case study in dermatology

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....