Fine-Grained Distillation for Long Document Retrieval

DOI: 10.1609/aaai.v38i17.29947 Publication Date: 2024-03-25T12:02:43Z
ABSTRACT
Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto improve retriever by mimicking heterogeneous yet powerful cross-encoder. However, in contrast passages or sentences, on long suffers the \textit{scope hypothesis} that may cover multiple topics. This maximizes their structure heterogeneity and poses granular-mismatch issue, leading an inferior efficacy. In this work, we propose new learning framework, fine-grained (FGD), for long-document retrievers. While preserving conventional dense paradigm, it first produces global-consistent representations crossing different fine granularity then applies multi-granular aligned merely during training. experiments, evaluate our framework two benchmarks, which show state-of-the-art performance.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (2)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....