- Topic Modeling
- Expert finding and Q&A systems
- Multimodal Machine Learning Applications
- Information Retrieval and Search Behavior
- Natural Language Processing Techniques
- Mobile Crowdsensing and Crowdsourcing
- FinTech, Crowdfunding, Digital Finance
- Domain Adaptation and Few-Shot Learning
- Web Data Mining and Analysis
- Speech and dialogue systems
- Radio, Podcasts, and Digital Media
- Recommender Systems and Techniques
- Caching and Content Delivery
- Advanced Data Storage Technologies
University of Massachusetts Amherst
2019-2023
Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating have studied recently but the accurate utilization of user responses relatively less explored. In this paper, we enrich representations learned by Transformer networks using novel attention mechanism from external sources that weights...
Podcasts are spoken documents across a wide-range of genres and styles, with growing listenership the world, rapidly lowering barrier to entry for both listeners creators. The great strides in search recommendation research industry have yet see impact podcast space, where recommendations still largely driven by word mouth. In this perspective paper, we highlight many differences between podcasts other media, discuss our on challenges future directions domain information access.
In information retrieval (IR), domain adaptation is the process of adapting a model to new whose data distribution different from source domain. Existing methods in this area focus on unsupervised where they have access target document collection or supervised (often few-shot) additionally (limited) labeled There also exists research improving zero-shot performance models with no adaptation. This paper introduces category IR that as-yet unexplored. Here, similar setting, we assume does not...
Estimating the quality of a result list, often referred to as query performance prediction (QPP), is challenging and important task in information retrieval. It can be used feedback users, search engines, system administrators. Although predicting retrieval models has been extensively studied for ad-hoc task, effectiveness methods question answering (QA) systems relatively unstudied. The short length answers, dominance neural QA, re-ranking nature most QA make unique, important, technically...
Representation learning has always played an important role in information retrieval (IR) systems. Most models, including recent neural approaches, use representations to calculate similarities between queries and documents find relevant from a corpus. Recent models large-scale pre-trained language for query representation. The typical of these however, major limitation that they generate only single representation query, which may have multiple intents or facets. focus this paper is address...
This paper introduces a framework for the automated evaluation of natural language texts. A manually constructed rubric describes how to assess multiple dimensions interest. To evaluate text, large model (LLM) is prompted with each question and produces distribution over potential responses. The LLM predictions often fail agree well human judges -- indeed, humans do not fully one another. However, distributions can be $\textit{combined}$ $\textit{predict}$ judge's annotations on all...
Learning multiple intent representations for queries has potential applications in facet generation, document ranking, search result diversification, and explanation. The state-of-the-art model this task assumes that there is a sequence of representations. In paper, we argue the should not be penalized as long it generates an accurate complete set Based on intuition, propose stochastic permutation invariant approach optimizing such networks. We extrinsically evaluate proposed generation...
Over recent years, podcasts have emerged as a novel medium for sharing and broadcasting information over the Internet. Audio streaming platforms originally designed music content, such Amazon Music, Pandora, Spotify, reported rapid growth, with millions of users consuming every day. With emerging new information, need to develop access systems that enable efficient effective discovery from heterogeneous collection is more important than ever. However, in domains still remains understudied....
Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information systems. Despite importance task, community still feels significant lack large-scale question answering collections with real comprehensive relevance judgments. In this paper, we develop release collection 2,626 open-domain from diverse set categories. The dataset, called ANTIQUE, contains 34,011 manual annotations. were asked by users...
The rise in popularity of mobile and voice search has led to a shift focus from document retrieval short answer passage for non-factoid questions. Some the questions have multiple answers, aim is retrieve set relevant passages, which covers all these alternatives. Compared documents, answers are more specific typically form defined types or groups. Grouping passages based on strong similarity measures may provide means identifying types. Typically, kNN clustering combination with term-based...
Podcasts are spoken documents across a wide-range of genres and styles, with growing listenership the world, rapidly lowering barrier to entry for both listeners creators. The great strides in search recommendation research industry have yet see impact podcast space, where recommendations still largely driven by word mouth. In this perspective paper, we highlight many differences between podcasts other media, discuss our on challenges future directions domain information access.
In information retrieval (IR), domain adaptation is the process of adapting a model to new whose data distribution different from source domain. Existing methods in this area focus on unsupervised where they have access target document collection or supervised (often few-shot) additionally (limited) labeled There also exists research improving zero-shot performance models with no adaptation. This paper introduces category IR that as-yet unexplored. Here, similar setting, we assume does not...
Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating have studied recently but the accurate utilization of user responses relatively less explored. In this paper, we enrich representations learned by Transformer networks using novel attention mechanism from external sources that weights...