NFDI4DS | UHH-SEMS - Publication Details

Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation

FOS: Computer and information sciences Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition

DOI: 10.48550/arxiv.2405.14142 Publication Date: 2024-05-22

Abstract Supplemental Material References Cited by

AUTHORS (3)

Se-eun Yoon

Hyunsik Jeon

Julian McAuley

ABSTRACT

We introduce a multimodal dataset where users express preferences through images. These images encompass broad spectrum of visual expressions ranging from landscapes to artistic depictions. Users request recommendations for books or music that evoke similar feelings those captured in the images, and are endorsed by community upvotes. This supports two recommendation tasks: title generation multiple-choice selection. Our experiments with large foundation models reveal their limitations these tasks. Particularly, vision-language show no significant advantage over language-only counterparts use descriptions, which we hypothesize is due underutilized capabilities. To better harness abilities, propose chain-of-imagery prompting, results notable improvements. release our code datasets.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....