Predicting Individual Food Valuation via Vision-Language Embedding Model

DOI: 10.31234/osf.io/vkduq_v1 Publication Date: 2025-03-29T00:55:49Z
ABSTRACT
Food preferences differ among individuals, and these variations reflect underlying personalities or mental tendencies. However, capturing and predicting these individual differences remains challenging. Here, we propose a novel method to predict individual food preferences by using CLIP (Contrastive Language-Image Pre-Training), which can capture both visual and semantic features of food images. By applying this method to food image rating data obtained from human subjects, we demonstrated our method's prediction capability, which achieved better scores compared to methods using pixel-based embeddings or label text-based embeddings. Our method can also be used to characterize individual traits as characteristic vectors in the embedding space. By analyzing these individual trait vectors, we captured the tendency of the trait vectors of high picky eater group. In contrast, the group with relatively high levels of general psychopathology did not show any bias in the distribution of trait vectors, but their preferences were significantly less well-represented by a single trait vector for each individual. Our results demonstrate that CLIP embeddings, which integrate both visual and semantic features, not only effectively predict food image preferences but also provide valuable representations of individual trait characteristics, suggesting potential applications for understanding and addressing food preference patterns in both research and clinical contexts.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....