NFDI4DS | UHH-SEMS - Publication Details

Task Ambiguity in Humans and Language Models

Benchmark (surveying)

DOI: 10.48550/arxiv.2212.10711 Publication Date: 2022-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Alex Tamkin

Kunal Handa

Avash Shrestha

Noah D. Goodman

ABSTRACT

Language models have recently achieved strong performance across a wide range of NLP benchmarks. However, unlike benchmarks, real world tasks are often poorly specified, and agents must deduce the user's intended behavior from combination context, instructions, examples. We investigate how both humans behave in face such task ambiguity by proposing AmbiBench, new benchmark six ambiguously-specified classification tasks. evaluate on AmbiBench seeing well they identify using 1) instructions with varying degrees ambiguity, 2) different numbers labeled find that model scaling (to 175B parameters) training human feedback data enables to approach or exceed accuracy participants tasks, but either one alone is not sufficient. In addition, we show dramatically improve language trained without large-scale finetuning small number ambiguous in-context examples, providing promising direction for teaching generalize ambiguity.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Task Ambiguity in Humans and Language Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....