NFDI4DS | UHH-SEMS - Publication Details

Precise Zero-Shot Dense Retrieval without Relevance Labels

Relevance Similarity (geometry) Relevance Feedback Vector space model Zero (linguistics)

DOI: 10.18653/v1/2023.acl-long.99 Publication Date: 2023-08-05T00:57:42Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Luyu Gao

Xueguang Ma

Jimmy Lin

Jamie Callan

ABSTRACT

While dense retrieval has been shown to be effective and efficient across tasks languages, it remains difficult create fully zero-shot systems when no relevance labels are available. In this paper, we recognize the difficulty of learning encoding relevance. Instead, propose pivot through Hypothetical Document Embeddings (HyDE). Given a query, HyDE first prompts an instruction-following language model (e.g., InstructGPT) generate hypothetical document. The document captures patterns but is "fake" may contain hallucinations. Then, unsupervised contrastively learned encoder Contriever) encodes into embedding vector. This vector identifies neighborhood in corpus space, from which similar real documents retrieved based on similarity. second step grounds generated actual corpus, with encoder's bottleneck filtering out Our experiments show that significantly outperforms state-of-the-art retriever Contriever shows strong performance comparable fine-tuned retrievers various (e.g. web search, QA, fact verification) non-English languages sw, ko, ja, bn).

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (53)

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Precise Zero-Shot Dense Retrieval without Relevance Labels

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....