NFDI4DS | UHH-SEMS - Publication Details

Benchmarking Large Language Models for News Summarization

Benchmarking

DOI: 10.48550/arxiv.2301.13848 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Tianyi Zhang

Faisal Ladhak

Esin Durmus

Percy Liang

Kathleen McKeown

Tatsunori Hashimoto

ABSTRACT

Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. By conducting a human evaluation on ten LLMs across different pretraining methods, prompts, and model scales, we make two important observations. First, find instruction tuning, not size, is key to LLM's zero-shot capability. Second, existing studies been limited by low-quality references, leading underestimates of performance lower few-shot finetuning performance. To better evaluate LLMs, perform over high-quality summaries collect from freelance writers. Despite major stylistic differences such as amount paraphrasing, that LMM judged be par with written summaries.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Benchmarking Large Language Models for News Summarization

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....