Multi-modal Graph Contrastive Learning for Micro-video Recommendation

Modalities Modality (human–computer interaction) Feature Learning Representation
DOI: 10.1145/3477495.3532027 Publication Date: 2022-07-07T15:12:08Z
ABSTRACT
Recently micro-videos have become more popular in social media platforms such as TikTok and Instagram. Engagements these are facilitated by multi-modal recommendation systems. Indeed, multimedia content can involve diverse modalities, often represented visual, acoustic, textual features to the recommender model. Existing works micro-video tend unify channels, thereby treating each modality with equal importance. However, we argue that approaches not sufficient encode item representations multiple since used methods cannot fully disentangle users' tastes on different modalities. To tackle this problem, propose a novel learning method named Multi-Modal Graph Contrastive Learning (MMGCL), which aims explicitly enhance representation self-supervised manner. In particular, devise two augmentation techniques generate views of user/item: edge dropout masking. Furthermore, introduce negative sampling technique allows learn correlation between modalities ensures effective contribution modality. Extensive experiments conducted datasets demonstrate superiority our proposed MMGCL over existing state-of-the-art terms both performance training convergence speed.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (26)
CITATIONS (69)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....