Doubly Robust Crowdsourcing

Crowdsourcing Majority Rule Point estimation Synthetic data
DOI: 10.48550/arxiv.1906.08591 Publication Date: 2019-01-01
ABSTRACT
Large-scale labeled dataset is the indispensable fuel that ignites AI revolution as we see today. Most such datasets are constructed using crowdsourcing services Amazon Mechanical Turk which provides noisy labels from non-experts at a fair price. The sheer size of mandates it only feasible to collect few per data point. We formulate problem test-time label aggregation statistical estimation inferring expected voting score. By imitating workers with supervised learners and them in doubly robust framework, prove variance can be substantially reduced, even if learner poor approximation. Synthetic real-world experiments show by combining approach adaptive worker/item selection rules, often need much lower cost achieve nearly same accuracy ideal world where all points.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....