NFDI4DS | UHH-SEMS - Publication Details

Using schema.org Annotations for Training and Maintaining Product Matchers

Schema Matching Schema (genetic algorithms)

DOI: 10.1145/3405962.3405964 Publication Date: 2020-08-25T04:24:02Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Ralph Peeters

Anna Primpeli

Benedikt Wichtlhuber

Christian Bizer

ABSTRACT

Product matching is a central task within e-commerce applications such as price comparison portals and online market places. State-of-the-art product methods achieve F1 scores above 0.90 using deep learning techniques combined with huge amounts of training data (e.g > 100K pairs offers). Gathering maintaining large corpora costly, it implies labeling offers matches or non-matches. Acquiring the ability to be good at thus means major investment for an company. This paper shows that manual can replaced by relying exclusively on schema.org annotations gathered from public Web. We show only training, we are able between 0.92 0.95 depending category. As new products appear everyday, important models maintained justifiable effort. In order give practical advice how maintain models, compare performance traditional unseen experiment different fine-tuning re-training strategies model maintenance, again data. Finally, Web distant supervision carries inherent noise, evaluate regards their label-noise resistance deal identifier-noise found in annotations.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (31)

CITATIONS (8)

EXTERNAL LINKS

OPENALEX - Publications CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Using schema.org Annotations for Training and Maintaining Product Matchers

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....