Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval

Leverage (statistics) Relevance Plain text Code (set theory)
DOI: 10.48550/arxiv.2308.11474 Publication Date: 2023-01-01
ABSTRACT
Grounded on pre-trained language models (PLMs), dense retrieval has been studied extensively plain text. In contrast, there little research retrieving data with multiple aspects using models. the scenarios such as product search, aspect information plays an essential role in relevance matching, e.g., category: Electronics, Computers, and Pet Supplies. A common way of leveraging for multi-aspect is to introduce auxiliary classification objective, i.e., item contents predict annotated value IDs aspects. However, by learning embeddings from scratch, this approach may not capture various semantic similarities between values sufficiently. To address limitation, we leverage text strings rather than class during pre-training so that their can be naturally captured PLMs. facilitate effective strings, propose mutual prediction objectives content. way, our model makes more sufficient use conducting undifferentiated masked modeling (MLM) concatenated Extensive experiments two real-world datasets (product mini-program search) show outperform competitive baselines both treating classes same MLM content strings. Code related dataset will available at URL \footnote{https://github.com/sunxiaojie99/ATTEMPT}.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....