NFDI4DS | UHH-SEMS - Publication Details

Dense X Retrieval: What Retrieval Granularity Should We Use?

FOS: Computer and information sciences Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computation and Language (cs.CL) Information Retrieval (cs.IR) Computer Science - Information Retrieval

DOI: 10.18653/v1/2024.emnlp-main.845 Publication Date: 2024-11-27T22:28:12Z

Abstract Supplemental Material References Cited by

AUTHORS (8)

Tong Chen

Hongwei Wang

Sihao Chen

Wenhao Yu

Kaixin Ma

Xinran Zhao

Hongming Zhang

Dong Yu

ABSTRACT

Dense retrieval has become a prominent method to obtain relevant context or world knowledge in open-domain NLP tasks. When we use a learned dense retriever on a retrieval corpus at inference time, an often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We discover that the retrieval unit choice significantly impacts the performance of both retrieval and downstream tasks. Distinct from the typical approach of using passages or sentences, we introduce a novel retrieval unit, proposition, for dense retrieval. Propositions are defined as atomic expressions within text, each encapsulating a distinct factoid and presented in a concise, self-contained natural language format. We conduct an empirical comparison of different retrieval granularity. Our experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks. Moreover, constructing prompts with fine-grained retrieved units for retrieval-augmented language models improves the performance of downstream QA tasks given a specific computation budget.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (5)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Dense X Retrieval: What Retrieval Granularity Should We Use?

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....