NFDI4DS | UHH-SEMS - Publication Details

Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

Multimodal therapy

DOI: 10.1609/aaai.v39i12.33426 Publication Date: 2025-04-11T12:10:00Z

Abstract Supplemental Material References Cited by

AUTHORS (9)

Yuyang Ye

Zhi Zheng

Yishan Shen

Tianshu Wang

Hengruo Zhang

Peijun Zhu

Runlong Yu

Kai Zhang

Hui Xiong

ABSTRACT

Recent advances in Large Language Models (LLMs) have demonstrated significant potential the field of Recommendation Systems (RSs). Most existing studies focused on converting user behavior logs into textual prompts and leveraging techniques such as prompt tuning to enable LLMs for recommendation tasks. Meanwhile, research interest has recently grown multimodal systems that integrate data from images, text, other sources using modality fusion techniques. This introduces new challenges LLM-based paradigm which relies solely text information. Moreover, although Multimodal (MLLMs) capable processing multi-modal inputs emerged, how equip MLLMs with capabilities remains largely unexplored. To this end, paper, we propose Model-enhanced Sequential (MLLM-MSR) model. capture dynamic preference, design a two-stage preference summarization method. Specifically, first utilize an MLLM-based item-summarizer extract image feature given item convert text. Then, employ recurrent generation changes preferences based user-summarizer. Finally, MLLM task, fine-tune recommender Supervised Fine-Tuning (SFT) Extensive evaluations across various datasets validate effectiveness MLLM-MSR, showcasing its superior ability adapt evolving dynamics preferences.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (1)

EXTERNAL LINKS

CROSSREF - Publications OPENALEX - Publications

PlumX Metrics

Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....