NFDI4DS | UHH-SEMS - Publication Details

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

Benchmark (surveying) Structuring Benchmarking Code (set theory) Disk formatting

DOI: 10.48550/arxiv.2309.08963 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (7)

Xiangru Tang

Yiming Zong

Jason Phang

Yilun Zhao

Wangchunshu Zhou

Arman Cohan

Mark Gerstein

ABSTRACT

Despite the remarkable capabilities of Large Language Models (LLMs) like GPT-4, producing complex, structured tabular data remains challenging. Our study assesses LLMs' proficiency in structuring tables and introduces a novel fine-tuning method, cognizant structures, to bolster their performance. We unveil Struc-Bench, comprehensive benchmark featuring prominent LLMs (GPT-NeoX-20B, GPT-3.5, Vicuna), which spans text tables, HTML, LaTeX formats. proposed FormatCoT aids crafting format-specific instructions from intended outputs populate this benchmark. Addressing gap task-centered evaluation, we propose two innovative metrics, P-Score (Prompting Score) H-Score (Heuristical Score), more accurately gauge LLM experiments show that applying our structure-aware LLaMA-7B leads substantial performance gains, outshining its counterparts across most measures. In-depth error analysis creating an ability map six dimensions -- coverage, formatting, reasoning, comprehension, pragmatics, hallucination highlight areas for future enhancements suggest forthcoming research trajectories. code models can be found at https://github.com/gersteinlab/Struc-Bench.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....