- Information Retrieval and Search Behavior
- Topic Modeling
- Recommender Systems and Techniques
- Advanced Image and Video Retrieval Techniques
- Expert finding and Q&A systems
- Domain Adaptation and Few-Shot Learning
- Image Retrieval and Classification Techniques
- Web Data Mining and Analysis
- Advanced Graph Neural Networks
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Mobile Crowdsensing and Crowdsourcing
- Advanced Text Analysis Techniques
- Artificial Intelligence in Law
- Text and Document Classification Technologies
- Speech and dialogue systems
- Machine Learning and Data Classification
- Advanced Bandit Algorithms Research
- Sentiment Analysis and Opinion Mining
- Misinformation and Its Impacts
- AI in Service Interactions
- Electrochemical Analysis and Applications
- Legal Education and Practice Innovations
- Advanced biosensing and bioanalysis techniques
- Digital Marketing and Social Media
Renmin University of China
2020-2024
Southern University of Science and Technology
2024
Ningbo University
2019-2024
Tianjin University
2024
Didi Chuxing (China)
2023
Northwest University
2023
Tsinghua University
2014-2022
Gannan Normal University
2019
University of Jinan
2016
Hohai University
2013
Ranking has always been one of the top concerns in information retrieval researches. For decades, lexical matching signal dominated ad-hoc process, but solely using this may cause vocabulary mismatch problem. In recent years, with development representation learning techniques, many researchers turn to Dense Retrieval (DR) models for better ranking performance. Although several existing DR have already obtained promising results, their performance improvement heavily relies on sampling...
Legal case retrieval is a specialized IR task that involves retrieving supporting cases given query case. Compared with traditional ad-hoc text retrieval, the legal more challenging since much longer and complex than common keyword queries. Besides that, definition of relevance between beyond general topical it therefore difficult to construct large-scale dataset, especially one accurate judgments. To address these challenges, we propose BERT-PLI, novel model utilizes BERT capture semantic...
The progress of recommender systems is hampered mainly by evaluation as it requires real-time interactions between humans and systems, which too laborious expensive. This issue usually approached utilizing the interaction history to conduct offline evaluation. However, existing datasets user-item are partially observed, leaving unclear how what extent missing will influence To answer this question, we collect a fully-observed dataset from Kuaishou's online environment, where almost all 1,411...
Recently, Information Retrieval community has witnessed fast-paced advances in Dense (DR), which performs first-stage retrieval with embedding-based search. Despite the impressive ranking performance, previous studies usually adopt brute-force search to acquire candidates, is prohibitive practical Web scenarios due its tremendous memory usage and time cost. To overcome these problems, vector compression methods have been adopted many applications. One of most popular Product Quantization...
Relevance is a fundamental concept in information retrieval (IR) studies. It however often observed that relevance as annotated by secondary assessors may not necessarily mean usefulness and satisfaction perceived users. In this study, we confirm the difference laboratory study which collect annotations external assessors, user users, for set of search tasks. We also find measure based on rather than has better correlation with satisfaction. However, show are capable annotating when provided...
Recent years have witnessed the success of deep neural networks in many research areas. The fundamental idea behind design most is to learn similarity patterns from data for prediction and inference, which lacks ability cognitive reasoning. However, concrete reasoning critical theoretical practical problems. On other hand, traditional symbolic methods do well making logical but they are mostly hard rule-based reasoning, limits their generalization different tasks since difference may require...
Although exact term match between queries and documents is the dominant method to perform first-stage retrieval, we propose a different approach, called RepBERT, represent with fixed-length contextualized embeddings. The inner products of query document embeddings are regarded as relevance scores. On MS MARCO Passage Ranking task, RepBERT achieves state-of-the-art results among all initial retrieval techniques. And its efficiency comparable bag-of-words methods.
How to obtain an unbiased ranking model by learning rank with biased user feedback is important research question for IR. Existing work on (ULTR) can be broadly categorized into two groups—the studies algorithms logged data, namely, the offline learning, and parameters estimation real-time interactions, online rank. While their definitions of unbiasness are different, these types ULTR share same goal—to find best models that documents based intrinsic relevance or utility. However, most...
As queries submitted by users directly affect search experiences, how to organize has always been a research focus in Web studies. While request becomes complex and exploratory, many sessions contain more than single query thus reformulation necessity. To help better formulate their these tasks, modern engines usually provide series of entries on engine result pages (SERPs), i.e., suggestions related entities. However, few existing work have thoroughly studied why perform reformulations...
Dense Retrieval (DR) has achieved state-of-the-art first-stage ranking effectiveness. However, the efficiency of most existing DR models is limited by large memory cost storing dense vectors and time-consuming nearest neighbor search (NNS) in vector space. Therefore, we present RepCONC, a novel retrieval model that learns discrete Representations via CONstrained Clustering. RepCONC jointly trains dual-encoders Product Quantization (PQ) method to learn document representations enables fast...
To better exploit the search logs, various click models have been proposed to extract implicit relevance feedback from user clicks. Most traditional are based on probability graphical (PGMs) with manually designed dependencies. Recently, some researchers also adopt neural-based methods improve accuracy of prediction. However, most existing only model behavior in query level. As previous iterations within session may an impact current round, we can leverage these signals behaviors. In this...
Document ranking is one of the most studied but challenging problems in information retrieval (IR) research. A number existing document models capture relevance signals at whole level. Recently, more and research has begun to address this problem from fine-grained modeling. Several works leveraged passage-level models. However, these focus on context-independent ignore context information, which may lead inaccurate estimation relevance. In paper, we investigate how gain accumulates with...
Although BERT has shown its effectiveness in a number of IR-related tasks, especially document ranking, the understanding internal mechanism remains insufficient. To increase explainability ranking process performed by BERT, we investigate state-of-the-art BERT-based model with focus on attention and interaction behavior. Firstly, look into evolving distribution. It shows that each step, dumps redundant weights tokens high frequency (such as periods). This may lead to potential threat...
Retrieval-augmented generation (RAG) is extensively utilized to incorporate external, current knowledge into large language models, thereby minimizing hallucinations. A standard RAG pipeline may comprise several components, such as query rewriting, document retrieval, filtering, and answer generation. However, these components are typically optimized separately through supervised fine-tuning, which can lead misalignments between the objectives of individual modules overarching aim generating...
Machine Reading Comprehension (MRC) is one of the most challenging tasks in both NLP and IR researches. Recently, a number deep neural models have been successfully adopted to some simplified MRC task settings, whose performances were close or even better than human beings. However, these still large performance gaps with beings more practical such as MS MARCO DuReader datasets. Although there are many works studying reading behavior, behavior patterns complex comprehension scenarios remain...
Web search session data is precious for a wide range of Information Retrieval (IR) tasks, such as search, query suggestion, click through rate (CTR) prediction and so on. Numerous studies have shown the great potential considering context information system optimization. The well-known TREC Session Tracks enhanced development in this domain to extent. However, they are mainly collected via user or crowdsourcing experiments normally contain only tens thousands sessions, which deficient...
Reading is a complex cognitive activity in many information retrieval related scenarios, such as relevance judgement and question answering. There exists plenty of works which model these processes matching problem, focuses on how to estimate the score between document query. However, little known about what happened during reading process, i.e., users allocate their attention while specific task. We believe that better understanding this process can help us design weighting functions inside...
People often conduct exploratory search to explore unfamiliar information space and learn new knowledge. While supporting the highly dynamic interactive is still challenging for system, we want investigate which factors can make successful satisfying from user’s perspective. Previous research suggests that domain experts have different strategies are more in finding domain-specific information, but how expertise level will influence users’ interaction outcomes search, especially knowledge...
Users' click-through behavior is considered as a valuable yet noisy source of implicit relevance feedback for web search engines. A series click models have therefore been proposed to extract accurate and unbiased from logs. Previous works shown that users' behaviors in mobile desktop scenarios are rather different many aspects, therefore, the were designed may not be effective context. To address this problem, we propose novel Mobile Click Model (MCM) how users examine results on SERPs....
User satisfaction is an important variable in Web search evaluation studies and has received more attention recent years. Many regard user as the ground truth for designing better metrics. However, most of existing focus on Cranfield-like metrics to reflect at query-level. As information need becomes complex, users often multiple queries multi-round interactions complete a task (e.g. exploratory search). In those cases, how characterize user's during session still remains be investigated....
Legal case retrieval, which aims to retrieve relevant cases given a query case, has drawn increasing research attention in recent years. While much worked on developing automatic retrieval models, how characterize relevance this specialized information (IR) task is still an open question. Towards in-depth understanding of judgments, we conduct laboratory user study that involves 72 participants different domain expertise. In the study, collect score along with detailed explanations for...
Pre-trained language models (PLMs) have achieved great success in the area of Information Retrieval. Studies show that applying these to ad-hoc document ranking can achieve better retrieval effectiveness. However, on Web, most information is organized form HTML web pages. In addition pure text content, structure content by tags also an important part delivered a page. Currently, such structured totally ignored pre-trained which are trained solely based content. this paper, we propose...