CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation

Crowdsourcing
DOI: 10.1145/3531146.3534647 Publication Date: 2022-06-20T14:27:10Z
ABSTRACT
Human annotated data plays a crucial role in machine learning (ML) research and development. However, the ethical considerations around processes decisions that go into dataset annotation have not received nearly enough attention. In this paper, we survey an array of literature provides insights crowdsourced annotation. We synthesize these insights, lay out challenges space along two layers: (1) who annotator is, how annotators' lived experiences can impact their annotations, (2) relationship between annotators crowdsourcing platforms, what affords them. Finally, introduce novel framework, CrowdWorkSheets, for developers to facilitate transparent documentation key points at various stages pipeline: task formulation, selection annotators, platform infrastructure choices, analysis evaluation, release maintenance.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (59)
CITATIONS (33)