Precise Zero-Shot Dense Retrieval without Relevance Labels
Relevance
Similarity (geometry)
Relevance Feedback
Vector space model
Zero (linguistics)
DOI:
10.18653/v1/2023.acl-long.99
Publication Date:
2023-08-05T00:57:42Z
AUTHORS (4)
ABSTRACT
While dense retrieval has been shown to be effective and efficient across tasks languages, it remains difficult create fully zero-shot systems when no relevance labels are available. In this paper, we recognize the difficulty of learning encoding relevance. Instead, propose pivot through Hypothetical Document Embeddings (HyDE). Given a query, HyDE first prompts an instruction-following language model (e.g., InstructGPT) generate hypothetical document. The document captures patterns but is "fake" may contain hallucinations. Then, unsupervised contrastively learned encoder Contriever) encodes into embedding vector. This vector identifies neighborhood in corpus space, from which similar real documents retrieved based on similarity. second step grounds generated actual corpus, with encoder's bottleneck filtering out Our experiments show that significantly outperforms state-of-the-art retriever Contriever shows strong performance comparable fine-tuned retrievers various (e.g. web search, QA, fact verification) non-English languages sw, ko, ja, bn).
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (53)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....