NFDI4DS | UHH-SEMS - Publication Details

Remining Hard Negatives for Generative Pseudo Labeled Domain Adaptation

FOS: Computer and information sciences Computer Science - Machine Learning Information Retrieval (cs.IR) Computer Science - Information Retrieval Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2501.14434 Publication Date: 2025-01-01

Abstract Supplemental Material References Cited by

AUTHORS (3)

Yuksel, Goksenin

Rau, David

Kamps, Jaap

ABSTRACT

Dense retrievers have demonstrated significant potential for neural information retrieval; however, they exhibit a lack of robustness to domain shifts, thereby limiting their efficacy in zero-shot settings across diverse domains. A state-of-the-art domain adaptation technique is Generative Pseudo Labeling (GPL). GPL uses synthetic query generation and initially mined hard negatives to distill knowledge from cross-encoder to dense retrievers in the target domain. In this paper, we analyze the documents retrieved by the domain-adapted model and discover that these are more relevant to the target queries than those of the non-domain-adapted model. We then propose refreshing the hard-negative index during the knowledge distillation phase to mine better hard negatives. Our remining R-GPL approach boosts ranking performance in 13/14 BEIR datasets and 9/12 LoTTe datasets. Our contributions are (i) analyzing hard negatives returned by domain-adapted and non-domain-adapted models and (ii) applying the GPL training with and without hard-negative re-mining in LoTTE and BEIR datasets.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Remining Hard Negatives for Generative Pseudo Labeled Domain Adaptation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....