NFDI4DS | UHH-SEMS - Publication Details

From Random to Informed Data Selection: A Diversity-Based Approach to Optimize Human Annotation and Few-Shot Learning

Crowdsourcing Training set Data set

DOI: 10.48550/arxiv.2401.13229 Publication Date: 2024-01-01

Abstract Supplemental Material References Cited by

AUTHORS (8)

Alexandre Alcoforado

Thomas Palmeira F...

Lucas H. Okamura

Israel Campos Fama

Arnold Moya Lavado

Bárbara Bueno

Bruno Veloso

Anna Helena Reali...

ABSTRACT

A major challenge in Natural Language Processing is obtaining annotated data for supervised learning. An option the use of crowdsourcing platforms annotation. However, introduces issues related to annotator's experience, consistency, and biases. alternative zero-shot methods, which turn have limitations compared their few-shot or fully counterparts. Recent advancements driven by large language models show potential, but struggle adapt specialized domains with severely limited data. The most common approaches therefore involve human itself randomly annotating a set datapoints build initial datasets. But sampling be often inefficient as it ignores characteristics specific needs model. situation worsens when working imbalanced datasets, random tends heavily bias towards majority classes, leading excessive To address these issues, this paper contributes an automatic informed selection architecture small dataset Our proposal minimizes quantity maximizes diversity selected annotation, while improving model performance.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

From Random to Informed Data Selection: A Diversity-Based Approach to Optimize Human Annotation and Few-Shot Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....