Automatic Histograms: Leveraging Language Models for Text Dataset Exploration
DOI:
10.48550/arxiv.2402.14880
Publication Date:
2024-02-21
AUTHORS (4)
ABSTRACT
Making sense of unstructured text datasets is perennially difficult, yet increasingly relevant with Large Language Models. Data workers often rely on dataset summaries, especially distributions various derived features. Some features, like toxicity or topics, are to many datasets, but interesting features domain specific: instruments and genres for a music dataset, diseases symptoms medical dataset. Accordingly, data run custom analyses each which cumbersome difficult. We present AutoHistograms, visualization tool leveragingLLMs. AutoHistograms automatically identifies visualizes them histograms, allows the user interactively query categories entities create new histograms. In study 10 (n=10), we observe that participants can quickly identify insights explore using conceptualize broad range applicable use cases. Together, this contributeto growing field LLM-assisted sensemaking tools.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....