Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation
FOS: Computer and information sciences
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
DOI:
10.48550/arxiv.2405.14142
Publication Date:
2024-05-22
AUTHORS (3)
ABSTRACT
We introduce a multimodal dataset where users express preferences through images. These images encompass broad spectrum of visual expressions ranging from landscapes to artistic depictions. Users request recommendations for books or music that evoke similar feelings those captured in the images, and are endorsed by community upvotes. This supports two recommendation tasks: title generation multiple-choice selection. Our experiments with large foundation models reveal their limitations these tasks. Particularly, vision-language show no significant advantage over language-only counterparts use descriptions, which we hypothesize is due underutilized capabilities. To better harness abilities, propose chain-of-imagery prompting, results notable improvements. release our code datasets.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....