NFDI4DS | UHH-SEMS - Publication Details

Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow

Benchmark (surveying) Sample (material) Human-in-the-loop Feedback loop

DOI: 10.48550/arxiv.2302.04434 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Anjana Arunkumar

Swaroop Mishra

Bhavdeep Sachdeva

Chitta Baral

Chris Bryan

ABSTRACT

Recent research has shown that language models exploit `artifacts' in benchmarks to solve tasks, rather than truly learning them, leading inflated model performance. In pursuit of creating better benchmarks, we propose VAIDA, a novel benchmark creation paradigm for NLP, focuses on guiding crowdworkers, an under-explored facet addressing idiosyncrasies. VAIDA facilitates sample correction by providing realtime visual feedback and recommendations improve quality. Our approach is domain, model, task, metric agnostic, constitutes shift robust, validated, dynamic via human-and-metric-in-the-loop workflows. We evaluate expert review user study with NASA TLX. find decreases effort, frustration, mental, temporal demands crowdworkers analysts, simultaneously increasing the performance both groups 45.8% decrease level artifacts created samples. As product our study, observe samples are adversarial across models, 31.3% (BERT), 22.5% (RoBERTa), 14.98% (GPT-3 fewshot)

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....