NFDI4DS | UHH-SEMS - Publication Details

On the Influence of Context Size and Model Choice in Retrieval-Augmented Generation Systems

FOS: Computer and information sciences Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computation and Language (cs.CL)

DOI: 10.48550/arxiv.2502.14759 Publication Date: 2025-02-20

Abstract Supplemental Material References Cited by

AUTHORS (2)

Juraj Vladika

Florian Matthes

ABSTRACT

Retrieval-augmented generation (RAG) has emerged as an approach to augment large language models (LLMs) by reducing their reliance on static knowledge and improving answer factuality. RAG retrieves relevant context snippets generates based them. Despite its increasing industrial adoption, systematic exploration of components is lacking, particularly regarding the ideal size provided context, choice base LLM retrieval method. To help guide development robust systems, we evaluate various sizes, BM25 semantic search retrievers, eight LLMs. Moving away from usual evaluation with short answers, explore more challenging long-form question answering in two domains, where a good utilize entire context. Our findings indicate that final QA performance improves steadily up 15 but stagnates or declines beyond that. Finally, show different general-purpose LLMs excel biomedical domain than encyclopedic one, open-domain evidence corpora challenging.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

On the Influence of Context Size and Model Choice in Retrieval-Augmented Generation Systems

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....