Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics

Evaluation of summarization [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing Computer Science - Computation and Language Computer Science - Artificial Intelligence [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] Evaluation metric Computer Science - Multimedia [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] Computer Science - Information Retrieval
DOI: 10.18653/v1/2024.emnlp-main.1078 Publication Date: 2024-11-27T22:28:12Z
ABSTRACT
Automatic metrics are used as proxies to evaluate abstractive summarization systems when human annotations are too expensive. To be useful, these metrics should be fine-grained, show a high correlation with human annotations, and ideally be independent of reference quality; however, most standard evaluation metrics for summarization are reference-based, and existing reference-free metrics correlate poorly with relevance, especially on summaries of longer documents. In this paper, we introduce a reference-free metric that correlates well with human evaluated relevance, while being very cheap to compute. We show that this metric can also be used alongside reference-based metrics to improve their robustness in low quality reference settings.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....