- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Web Data Mining and Analysis
- Image Retrieval and Classification Techniques
- Topic Modeling
- Machine Learning and Data Classification
- Advanced Image and Video Retrieval Techniques
Microsoft (United States)
2023-2024
Knowledge distillation is an effective way to transfer knowledge from a strong teacher efficient student model. Ideally, we expect the better is, performs. However, this expectation does not always come true. It common that model results in bad via due nonnegligible gap between and student. To bridge gap, propose PROD, PROgressive Distillation method, for dense retrieval. PROD consists of progressive data gradually improve alleviate catastrophic forgetting, introduce regularization term each...
Knowledge distillation is often used to transfer knowledge from a strong teacher model relatively weak student model. Traditional methods include response-based and feature-based methods. Response-based are widely but suffer lower upper limits of performance due their ignorance intermediate signals, while have constraints on vocabularies, tokenizers architectures. In this paper, we propose liberal method (LEAD). LEAD aligns the distribution between layers model, which effective, extendable,...
Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, first large-scale information-rich web dataset, featuring millions real clicked query-document labels. This dataset closely mimics real-world document query distribution, provides rich information for various kinds downstream tasks encourages research areas, such as generic end-to-end neural indexer models, embedding next...