NFDI4DS | UHH-SEMS - Publication Details

Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval

Leverage (statistics) Relevance Plain text Code (set theory)

DOI: 10.48550/arxiv.2308.11474 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (8)

Xiaojie Sun

Keping Bi

Jiafeng Guo

Xinyu Ma

Fan Yixing

Hongyu Shan

Qishen Zhang

Zhongyi Liu

ABSTRACT

Grounded on pre-trained language models (PLMs), dense retrieval has been studied extensively plain text. In contrast, there little research retrieving data with multiple aspects using models. the scenarios such as product search, aspect information plays an essential role in relevance matching, e.g., category: Electronics, Computers, and Pet Supplies. A common way of leveraging for multi-aspect is to introduce auxiliary classification objective, i.e., item contents predict annotated value IDs aspects. However, by learning embeddings from scratch, this approach may not capture various semantic similarities between values sufficiently. To address limitation, we leverage text strings rather than class during pre-training so that their can be naturally captured PLMs. facilitate effective strings, propose mutual prediction objectives content. way, our model makes more sufficient use conducting undifferentiated masked modeling (MLM) concatenated Extensive experiments two real-world datasets (product mini-program search) show outperform competitive baselines both treating classes same MLM content strings. Code related dataset will available at URL \footnote{https://github.com/sunxiaojie99/ATTEMPT}.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....