Algorithmic fairness datasets: the story so far

Unpacking Internal documentation Equity Disadvantaged
DOI: 10.1007/s10618-022-00854-z Publication Date: 2022-09-17T06:02:36Z
ABSTRACT
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being. As a result, growing community of researchers has been investigating the equity existing and proposing novel ones, advancing understanding risks opportunities automated decision-making for historically disadvantaged populations. Progress fair Machine Learning hinges on data, which can be appropriately used only if adequately documented. Unfortunately, algorithmic fairness suffers from collective data documentation debt caused by lack information specific resources (opacity) scatteredness available (sparsity). In this work, we target surveying over two hundred datasets employed research, producing standardized searchable each them. Moreover rigorously identify three most popular datasets, namely Adult, COMPAS German Credit, compile in-depth documentation. This unifying effort supports multiple contributions. Firstly, summarize merits limitations adding recent scholarship, calling into question their suitability as general-purpose benchmarks. Secondly, document hundreds alternatives, annotating domain supported tasks, along with additional properties interest researchers. Finally, analyze these perspective five important curation topics: anonymization, consent, inclusivity, sensitive attributes, transparency. We discuss different approaches levels attention topics, making them tangible, distill set best practices resources.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (618)
CITATIONS (49)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....