Using schema.org Annotations for Training and Maintaining Product Matchers

Schema Matching Schema (genetic algorithms)
DOI: 10.1145/3405962.3405964 Publication Date: 2020-08-25T04:24:02Z
ABSTRACT
Product matching is a central task within e-commerce applications such as price comparison portals and online market places. State-of-the-art product methods achieve F1 scores above 0.90 using deep learning techniques combined with huge amounts of training data (e.g > 100K pairs offers). Gathering maintaining large corpora costly, it implies labeling offers matches or non-matches. Acquiring the ability to be good at thus means major investment for an company. This paper shows that manual can replaced by relying exclusively on schema.org annotations gathered from public Web. We show only training, we are able between 0.92 0.95 depending category. As new products appear everyday, important models maintained justifiable effort. In order give practical advice how maintain models, compare performance traditional unseen experiment different fine-tuning re-training strategies model maintenance, again data. Finally, Web distant supervision carries inherent noise, evaluate regards their label-noise resistance deal identifier-noise found in annotations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (31)
CITATIONS (8)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....