- Topic Modeling
- Information Retrieval and Search Behavior
- Recommender Systems and Techniques
- Artificial Intelligence in Law
- Natural Language Processing Techniques
- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Legal Education and Practice Innovations
- Expert finding and Q&A systems
- EEG and Brain-Computer Interfaces
- Domain Adaptation and Few-Shot Learning
- Text and Document Classification Technologies
- Machine Learning and Algorithms
- Advanced Graph Neural Networks
- Web Data Mining and Analysis
- Mobile Crowdsensing and Crowdsourcing
- Sentiment Analysis and Opinion Mining
- Speech and dialogue systems
- Advanced Bandit Algorithms Research
- Multimodal Machine Learning Applications
- Explainable Artificial Intelligence (XAI)
- Advanced Text Analysis Techniques
- Artificial Intelligence Applications
- Machine Learning and Data Classification
- Neural Networks and Applications
Tsinghua University
2022-2025
University of Copenhagen
2024
University of Amsterdam
2024
Huawei Technologies (China)
2023
University of Utah
2019-2022
Provo College
2021
University of Massachusetts Amherst
2015-2019
Amherst College
2017-2019
In recent years, deep neural networks have led to exciting breakthroughs in speech recognition, computer vision, and natural language processing (NLP) tasks. However, there been few positive results of models on ad-hoc retrieval This is partially due the fact that many important characteristics task not well addressed yet. Typically, formalized as a matching problem between two pieces text existing work using models, treated equivalent NLP tasks such paraphrase identification, question...
State-of-the-art recommendation algorithms -- especially the collaborative filtering (CF) based approaches with shallow or deep models usually work various unstructured information sources for recommendation, such as textual reviews, visual images, and implicit explicit feedbacks. Though structured knowledge bases were considered in content-based approaches, they have been largely neglected recently due to availability of vast amount data, learning power many complex models. However, exhibit...
The Web has accumulated a rich source of information, such as text, image, rating, etc, which represent different aspects user preferences. However, the heterogeneous nature this information makes it difficult for recommender systems to leverage in unified framework boost performance. Recently, rapid development representation learning techniques provides an approach problem. By translating various sources into space, becomes possible integrate informed recommendation.
Conversational search and recommendation based on user-system dialogs exhibit major differences from conventional tasks in that 1) the user system can interact for multiple semantically coherent rounds a task through natural language dialog, 2) it becomes possible to understand needs or help users clarify their by asking appropriate questions directly. We believe ability ask so as actively is one of most important advantages conversational recommendation. In this paper, we propose evaluate...
As an alternative to question answering methods based on feature engineering, deep learning approaches such as convolutional neural networks (CNNs) and Long Short-Term Memory Models (LSTMs) have recently been proposed for semantic matching of questions answers. To achieve good results, however, these models combined with additional features word overlap or BM25 scores. Without this combination, perform significantly worse than linguistic engineering. In paper, we propose attention model...
Learning to rank has been intensively studied and widely applied in information retrieval. Typically, a global ranking function is learned from set of labeled data, which can achieve good performance on average but may be suboptimal for individual queries by ignoring the fact that relevant documents different have distributions feature space. Inspired idea pseudo relevance feedback where top ranked documents, we refer as \textit{local context}, provide important about query's...
The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions model evaluation, user-system interactions. More importantly, the synergistic...
Product search is an important part of online shopping. In contrast to many tasks, the objectives product are not confined retrieving relevant products. Instead, it focuses on finding items that satisfy needs individuals and lead a user purchase. The unique characteristics make personalization essential for both customers e-shopping companies. Purchase behavior highly personal in shopping users often provide rich feedback about their decisions (e.g. reviews). However, severe mismatch found...
Key frames are playing a very important role for many video applications, such as on-line movie preview and information retrieval. Although number of key frame selection methods have been proposed in the past, existing technologies mainly focus on how to precisely summarize content, but seldom take user preferences into consideration. However, real scenarios, people may cast diverse interests contents even same video, thus they be attracted by quite different frames, which makes an...
Learning to rank with biased click data is a well-known challenge. A variety of methods has been explored debias for learning such as models, result interleaving and, more recently, the unbiased learning-to-rank framework based on inverse propensity weighting. Despite their differences, most existing studies separate estimation bias (namely model ) from ranking algorithms. To estimate propensities, they either conduct online randomization, which can negatively affect user experience, or...
While in a classification or regression setting label value is assigned to each individual document, ranking we determine the relevance ordering of entire input document list. This difference leads notion relative between documents ranking. The majority existing learning-to-rank algorithms model such relativity at loss level using pairwise listwise functions. However, they are restricted univariate scoring functions, i.e., score computed based on itself, regardless other To overcome this...
In learning-to-rank for information retrieval, a ranking model is automatically learned from the data and then utilized to rank sets of retrieved documents. Therefore, an ideal would be mapping document set permutation on set, should satisfy two critical requirements: (1) it have ability cross-document interactions so as capture local context in query; (2) permutation-invariant, which means that any inputted documents not change output ranking. Previous studies either design uni-variate...
A common limitation of many information retrieval (IR) models is that relevance scores are solely based on exact (i.e., syntactic) matching words in queries and documents under the simple Bag-of-Words (BoW) representation. This not only leads to well-known vocabulary mismatch problem, but also does allow semantically related contribute score. Recent advances word embedding have shown semantic representations for can be efficiently learned by distributional models. natural generalization then...
Previous studies have shown that semantically meaningful representations of words and text can be acquired through neural embedding models. In particular, paragraph vector (PV) models impressive performance in some natural language processing tasks by estimating a document (topic) level model. Integrating the PV with traditional model approaches to retrieval, however, produces unstable limited improvements. this paper, we formally discuss three intrinsic problems original restrict its...
Intelligent assistants change the way people interact with computers and make it possible for to search products through conversations when they have purchase needs. During interactions, system could ask questions on certain aspects of ideal clarify users' For example, previous work proposed users exact characteristics their items before showing results. However, may not clear ideas about what an item looks like, especially seen any item. So is more feasible facilitate conversational by...
Product search is one of the most popular methods for customers to discover products online. Most existing studies on product focus developing effective retrieval models that rank items by their likelihood be purchased. However, they ignore problem there a gap between how systems and perceive relevance items. Without explanations, users may not understand why engines retrieve certain them, which consequentially leads imperfect user experience suboptimal system performance in practice. In...
How to obtain an unbiased ranking model by learning rank with biased user feedback is important research question for IR. Existing work on (ULTR) can be broadly categorized into two groups—the studies algorithms logged data, namely, the offline learning, and parameters estimation real-time interactions, online rank. While their definitions of unbiasness are different, these types ULTR share same goal—to find best models that documents based intrinsic relevance or utility. However, most...
Legal case retrieval, which aims to find relevant cases for a query case, plays core role in the intelligent legal system. Despite success that pre-training has achieved ad-hoc retrieval tasks, effective strategies remain be explored. Compared with general documents, documents are typically long text sequences intrinsic logical structures. However, most existing language models have difficulty understanding long-distance dependencies between different Moreover, contrast relevance domain is...
Product search is one of the most popular methods for people to discover and purchase products on e-commerce websites. Because personal preferences often have an important influence decision each customer, it intuitive that personalization should be beneficial product engines. While synthetic experiments from previous studies show histories are useful identifying individual intent session, effect in practice, however, remains mostly unknown. In this paper, we formulate problem personalized...
As an alternative to question answering methods based on feature engineering, deep learning approaches such as convolutional neural networks (CNNs) and Long Short-Term Memory Models (LSTMs) have recently been proposed for semantic matching of questions answers. To achieve good results, however, these models combined with additional features word overlap or BM25 scores. Without this combination, perform significantly worse than linguistic engineering. In paper, we propose attention model...
A long-standing challenge for search and conversational assistants is query intention detection in ambiguous queries. Asking clarifying questions has been widely studied considered an effective solution to resolve ambiguity. Existing work have explored various approaches question ranking generation. However, due the lack of real data, they use artificial datasets training, which limits their generalizability real-world scenarios. As a result, industry shown reluctance implement them reality,...