AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction

Click-through rate Voucher
DOI: 10.48550/arxiv.2308.16437 Publication Date: 2023-01-01
ABSTRACT
Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types items multiple scenarios, and modeling scenarios can provide more comprehensive understanding users. Existing only include data for same type single scenario. Secondly, multi-modal features are essential multi-scenario as they address inconsistent ID encoding between scenarios. The based on lack features. Third, large-scale dataset reliable evaluation models, fully reflecting performance differences models. scale around 100 million, which relatively small compared to real-world prediction. To these limitations, we propose AntM$^{2}$C, Multi-Scenario Multi-Modal industrial Alipay. Specifically, AntM$^{2}$C provides advantages: 1) It covers 5 items, providing insights into preferences including advertisements, vouchers, mini-programs, contents, videos. 2) Apart ID-based features, also 2 raw text image effectively establish connections with IDs. 3) 1 billion 200 million 6 items. currently largest-scale available. Based construct several typical tasks comparisons baseline methods. homepage available at https://www.atecup.cn/home.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....