Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation
Multimodal therapy
DOI:
10.1609/aaai.v39i12.33426
Publication Date:
2025-04-11T12:10:00Z
AUTHORS (9)
ABSTRACT
Recent advances in Large Language Models (LLMs) have demonstrated significant potential the field of Recommendation Systems (RSs). Most existing studies focused on converting user behavior logs into textual prompts and leveraging techniques such as prompt tuning to enable LLMs for recommendation tasks. Meanwhile, research interest has recently grown multimodal systems that integrate data from images, text, other sources using modality fusion techniques. This introduces new challenges LLM-based paradigm which relies solely text information. Moreover, although Multimodal (MLLMs) capable processing multi-modal inputs emerged, how equip MLLMs with capabilities remains largely unexplored. To this end, paper, we propose Model-enhanced Sequential (MLLM-MSR) model. capture dynamic preference, design a two-stage preference summarization method. Specifically, first utilize an MLLM-based item-summarizer extract image feature given item convert text. Then, employ recurrent generation changes preferences based user-summarizer. Finally, MLLM task, fine-tune recommender Supervised Fine-Tuning (SFT) Extensive evaluations across various datasets validate effectiveness MLLM-MSR, showcasing its superior ability adapt evolving dynamics preferences.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (1)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....