- Multimodal Machine Learning Applications
- Topic Modeling
- Consumer Market Behavior and Pricing
- Domain Adaptation and Few-Shot Learning
- Recommender Systems and Techniques
- Advanced Bandit Algorithms Research
Vision language pre-training aims to learn alignments between vision and from a large amount of data. Most existing methods only image-text alignments. Some others utilize pre-trained object detectors leverage at the level. In this paper, we propose multi-grained by unified framework that learns aligning localization simultaneously. Based on it, present X2-VLM, an all-in-one model with flexible modular architecture, in which further unify video-text one model. X2-VLM is able unlimited visual...
Accuracy and diversity have long been considered to be two conflicting goals for recommendations. We point out, however, that as the is typically measured by certain pre-selected item attributes, e.g., category most popularly employed one, improved can achieved without sacrificing recommendation accuracy, diversification respects user's preference about attributes. This calls a fine-grained understanding of preferences over items, where one needs recognize choice driven quality itself, or...