From Local to Global: A Graph RAG Approach to Query-Focused Summarization

FOS: Computer and information sciences Computer Science - Computation and Language Artificial Intelligence (cs.AI) H.3.3 Computer Science - Artificial Intelligence I.2.7 H.3.3; I.2.7 Computation and Language (cs.CL) Information Retrieval (cs.IR) Computer Science - Information Retrieval
DOI: 10.48550/arxiv.2404.16130 Publication Date: 2024-04-24
ABSTRACT
The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) answer questions over private and/or previously unseen document collections. However, RAG fails on global directed at entire text corpus, such as "What are the main themes in dataset?", since this is inherently a query-focused summarization (QFS) task, rather than explicit retrieval task. Prior QFS methods, meanwhile, fail scale quantities indexed by typical systems. To combine strengths these contrasting we propose Graph approach question answering corpora that scales with both generality user and quantity be indexed. Our uses LLM build graph-based index two stages: first derive entity graph documents, then pregenerate community summaries for all groups closely-related entities. Given question, each summary used generate partial response, before responses again summarized final response user. For class sensemaking datasets 1 million token range, show leads substantial improvements na\"ive baseline comprehensiveness diversity generated answers. An open-source, Python-based implementation local approaches forthcoming https://aka.ms/graphrag.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....