NFDI4DS | UHH-SEMS - Publication Details

Automatic Histograms: Leveraging Language Models for Text Dataset Exploration

DOI: 10.48550/arxiv.2402.14880 Publication Date: 2024-02-21

Abstract Supplemental Material References Cited by

AUTHORS (4)

Emily Reif

Crystal Qian

James Wexler

Minsuk Kahng

ABSTRACT

Making sense of unstructured text datasets is perennially difficult, yet increasingly relevant with Large Language Models. Data workers often rely on dataset summaries, especially distributions various derived features. Some features, like toxicity or topics, are to many datasets, but interesting features domain specific: instruments and genres for a music dataset, diseases symptoms medical dataset. Accordingly, data run custom analyses each which cumbersome difficult. We present AutoHistograms, visualization tool leveragingLLMs. AutoHistograms automatically identifies visualizes them histograms, allows the user interactively query categories entities create new histograms. In study 10 (n=10), we observe that participants can quickly identify insights explore using conceptualize broad range applicable use cases. Together, this contributeto growing field LLM-assisted sensemaking tools.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

Automatic Histograms: Leveraging Language Models for Text Dataset Exploration

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....