NFDI4DS | UHH-SEMS - Publication Details

Jiaxin Mao

ORCID: 0000-0002-9257-5498

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5072119199

Research Areas

Information Retrieval and Search Behavior
Topic Modeling
Recommender Systems and Techniques
Advanced Image and Video Retrieval Techniques
Expert finding and Q&A systems
Domain Adaptation and Few-Shot Learning
Image Retrieval and Classification Techniques
Web Data Mining and Analysis
Advanced Graph Neural Networks
Natural Language Processing Techniques
Multimodal Machine Learning Applications
Mobile Crowdsensing and Crowdsourcing
Advanced Text Analysis Techniques
Artificial Intelligence in Law
Text and Document Classification Technologies
Speech and dialogue systems
Machine Learning and Data Classification
Advanced Bandit Algorithms Research
Sentiment Analysis and Opinion Mining
Misinformation and Its Impacts
AI in Service Interactions
Electrochemical Analysis and Applications
Legal Education and Practice Innovations
Advanced biosensing and bioanalysis techniques
Digital Marketing and Social Media

Renmin University of China
2020-2024

Southern University of Science and Technology
2024

Ningbo University
2019-2024

Tianjin University
2024

Didi Chuxing (China)
2023

Northwest University
2023

Tsinghua University
2014-2022

Gannan Normal University
2019

University of Jinan
2016

Hohai University
2013

Optimizing Dense Retrieval Model Training with Hard Negatives

OPENALEX - Publications

Jingtao Zhan Jiaxin Mao Yiqun Liu Jiafeng Guo Min Zhang and 1 more

Ranking has always been one of the top concerns in information retrieval researches. For decades, lexical matching signal dominated ad-hoc process, but solely using this may cause vocabulary mismatch problem. In recent years, with development representation learning techniques, many researchers turn to Dense Retrieval (DR) models for better ranking performance. Although several existing DR have already obtained promising results, their performance improvement heavily relies on sampling...

10.1145/3404835.3462880 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021-07-11

BERT-PLI: Modeling Paragraph-Level Interactions for Legal Case Retrieval

OPENALEX - Publications

Yunqiu Shao Jiaxin Mao Yiqun Liu Weizhi Ma Ken Satoh and 2 more

Legal case retrieval is a specialized IR task that involves retrieving supporting cases given query case. Compared with traditional ad-hoc text retrieval, the legal more challenging since much longer and complex than common keyword queries. Besides that, definition of relevance between beyond general topical it therefore difficult to construct large-scale dataset, especially one accurate judgments. To address these challenges, we propose BERT-PLI, novel model utilizes BERT capture semantic...

10.24963/ijcai.2020/484 article EN 2020-07-01

KuaiRec

OPENALEX - Publications

Chongming Gao Shijun Li Wenqiang Lei Jiawei Chen Biao Li and 4 more

The progress of recommender systems is hampered mainly by evaluation as it requires real-time interactions between humans and systems, which too laborious expensive. This issue usually approached utilizing the interaction history to conduct offline evaluation. However, existing datasets user-item are partially observed, leaving unclear how what extent missing will influence To answer this question, we collect a fully-observed dataset from Kuaishou's online environment, where almost all 1,411...

10.1145/3511808.3557220 article EN Proceedings of the 31st ACM International Conference on Information & Knowledge Management 2022-10-16

Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance

OPENALEX - Publications

Jingtao Zhan Jiaxin Mao Yiqun Liu Jiafeng Guo Min Zhang and 1 more

Recently, Information Retrieval community has witnessed fast-paced advances in Dense (DR), which performs first-stage retrieval with embedding-based search. Despite the impressive ranking performance, previous studies usually adopt brute-force search to acquire candidates, is prohibitive practical Web scenarios due its tremendous memory usage and time cost. To overcome these problems, vector compression methods have been adopted many applications. One of most popular Product Quantization...

10.1145/3459637.3482358 article EN 2021-10-26

When does Relevance Mean Usefulness and User Satisfaction in Web Search?

OPENALEX - Publications

Jiaxin Mao Yiqun Liu Ke Zhou Jian‐Yun Nie Jingtao Song and 4 more

Relevance is a fundamental concept in information retrieval (IR) studies. It however often observed that relevance as annotated by secondary assessors may not necessarily mean usefulness and satisfaction perceived users. In this study, we confirm the difference laboratory study which collect annotations external assessors, user users, for set of search tasks. We also find measure based on rather than has better correlation with satisfaction. However, show are capable annotating when provided...

10.1145/2911451.2911507 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2016-07-07

Neural Logic Reasoning

OPENALEX - Publications

Shaoyun Shi Hanxiong Chen Weizhi Ma Jiaxin Mao Min Zhang and 1 more

Recent years have witnessed the success of deep neural networks in many research areas. The fundamental idea behind design most is to learn similarity patterns from data for prediction and inference, which lacks ability cognitive reasoning. However, concrete reasoning critical theoretical practical problems. On other hand, traditional symbolic methods do well making logical but they are mostly hard rule-based reasoning, limits their generalization different tasks since difference may require...

10.1145/3340531.3411949 article EN 2020-10-19

RepBERT: Contextualized Text Embeddings for First-Stage Retrieval

OPENALEX - Publications

Jingtao Zhan Jiaxin Mao Yiqun Liu Min Zhang Shaoping Ma

Although exact term match between queries and documents is the dominant method to perform first-stage retrieval, we propose a different approach, called RepBERT, represent with fixed-length contextualized embeddings. The inner products of query document embeddings are regarded as relevance scores. On MS MARCO Passage Ranking task, RepBERT achieves state-of-the-art results among all initial retrieval techniques. And its efficiency comparable bag-of-words methods.

10.48550/arxiv.2006.15498 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Unbiased Learning to Rank

OPENALEX - Publications

Qingyao Ai Tao Yang Huazheng Wang Jiaxin Mao

How to obtain an unbiased ranking model by learning rank with biased user feedback is important research question for IR. Existing work on (ULTR) can be broadly categorized into two groups—the studies algorithms logged data, namely, the offline learning, and parameters estimation real-time interactions, online rank. While their definitions of unbiasness are different, these types ULTR share same goal—to find best models that documents based intrinsic relevance or utility. However, most...

10.1145/3439861 article EN ACM transactions on office information systems 2021-02-17

Towards a Better Understanding of Query Reformulation Behavior in Web Search

OPENALEX - Publications

Jia Chen Jiaxin Mao Yiqun Liu Fan Zhang Min Zhang and 1 more

As queries submitted by users directly affect search experiences, how to organize has always been a research focus in Web studies. While request becomes complex and exploratory, many sessions contain more than single query thus reformulation necessity. To help better formulate their these tasks, modern engines usually provide series of entries on engine result pages (SERPs), i.e., suggestions related entities. However, few existing work have thoroughly studied why perform reformulations...

10.1145/3442381.3450127 article EN 2021-04-19

Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval

OPENALEX - Publications

Jingtao Zhan Jiaxin Mao Yiqun Liu Jiafeng Guo Min Zhang and 1 more

Dense Retrieval (DR) has achieved state-of-the-art first-stage ranking effectiveness. However, the efficiency of most existing DR models is limited by large memory cost storing dense vectors and time-consuming nearest neighbor search (NNS) in vector space. Therefore, we present RepCONC, a novel retrieval model that learns discrete Representations via CONstrained Clustering. RepCONC jointly trains dual-encoders Product Quantization (PQ) method to learn document representations enables fast...

10.1145/3488560.3498443 article EN Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining 2022-02-11

A Context-Aware Click Model for Web Search

OPENALEX - Publications

Jia Chen Jiaxin Mao Yiqun Liu Min Zhang Shaoping Ma

To better exploit the search logs, various click models have been proposed to extract implicit relevance feedback from user clicks. Most traditional are based on probability graphical (PGMs) with manually designed dependencies. Recently, some researchers also adopt neural-based methods improve accuracy of prediction. However, most existing only model behavior in query level. As previous iterations within session may an impact current round, we can leverage these signals behaviors. In this...

10.1145/3336191.3371819 article EN 2020-01-20

Leveraging Passage-level Cumulative Gain for Document Ranking

OPENALEX - Publications

Zhijing Wu Jiaxin Mao Yiqun Liu Jingtao Zhan Yukun Zheng and 2 more

Document ranking is one of the most studied but challenging problems in information retrieval (IR) research. A number existing document models capture relevance signals at whole level. Recently, more and research has begun to address this problem from fine-grained modeling. Several works leveraged passage-level models. However, these focus on context-independent ignore context information, which may lead inaccurate estimation relevance. In paper, we investigate how gain accumulates with...

10.1145/3366423.3380305 article EN 2020-04-20

An Analysis of BERT in Document Ranking

OPENALEX - Publications

Jingtao Zhan Jiaxin Mao Yiqun Liu Min Zhang Shaoping Ma

Although BERT has shown its effectiveness in a number of IR-related tasks, especially document ranking, the understanding internal mechanism remains insufficient. To increase explainability ranking process performed by BERT, we investigate state-of-the-art BERT-based model with focus on attention and interaction behavior. Firstly, look into evolving distribution. It shows that each step, dumps redundant weights tokens high frequency (such as periods). This may lead to potential threat...

10.1145/3397271.3401325 article EN 2020-07-25

Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

OPENALEX - Publications

Yiqun Chen Lingyong Yan Weiwei Sun Xinyu Ma Yi Zhang and 4 more

Retrieval-augmented generation (RAG) is extensively utilized to incorporate external, current knowledge into large language models, thereby minimizing hallucinations. A standard RAG pipeline may comprise several components, such as query rewriting, document retrieval, filtering, and answer generation. However, these components are typically optimized separately through supervised fine-tuning, which can lead misalignments between the objectives of individual modules overarching aim generating...

10.48550/arxiv.2501.15228 preprint EN arXiv (Cornell University) 2025-01-25

Human Behavior Inspired Machine Reading Comprehension

OPENALEX - Publications

Yukun Zheng Jiaxin Mao Yiqun Liu Zixin Ye Min Zhang and 1 more

Machine Reading Comprehension (MRC) is one of the most challenging tasks in both NLP and IR researches. Recently, a number deep neural models have been successfully adopted to some simplified MRC task settings, whose performances were close or even better than human beings. However, these still large performance gaps with beings more practical such as MS MARCO DuReader datasets. Although there are many works studying reading behavior, behavior patterns complex comprehension scenarios remain...

10.1145/3331184.3331231 article EN 2019-07-18

TianGong-ST

OPENALEX - Publications

Jia Chen Jiaxin Mao Yiqun Liu Min Zhang Shaoping Ma

Web search session data is precious for a wide range of Information Retrieval (IR) tasks, such as search, query suggestion, click through rate (CTR) prediction and so on. Numerous studies have shown the great potential considering context information system optimization. The well-known TREC Session Tracks enhanced development in this domain to extent. However, they are mainly collected via user or crowdsourcing experiments normally contain only tens thousands sessions, which deficient...

10.1145/3357384.3358158 article EN 2019-11-03

Understanding Reading Attention Distribution during Relevance Judgement

OPENALEX - Publications

Xiangsheng Li Yiqun Liu Jiaxin Mao Zexue He Min Zhang and 1 more

Reading is a complex cognitive activity in many information retrieval related scenarios, such as relevance judgement and question answering. There exists plenty of works which model these processes matching problem, focuses on how to estimate the score between document query. However, little known about what happened during reading process, i.e., users allocate their attention while specific task. We believe that better understanding this process can help us design weighting functions inside...

10.1145/3269206.3271764 article EN 2018-10-17

How Does Domain Expertise Affect Users’ Search Interaction and Outcome in Exploratory Search?

OPENALEX - Publications

Jiaxin Mao Yiqun Liu Noriko Kando Min Zhang Shaoping Ma

People often conduct exploratory search to explore unfamiliar information space and learn new knowledge. While supporting the highly dynamic interactive is still challenging for system, we want investigate which factors can make successful satisfying from user’s perspective. Previous research suggests that domain experts have different strategies are more in finding domain-specific information, but how expertise level will influence users’ interaction outcomes search, especially knowledge...

10.1145/3223045 article EN ACM transactions on office information systems 2018-07-17

Constructing Click Models for Mobile Search

OPENALEX - Publications

Jiaxin Mao Cheng Luo Min Zhang Shaoping Ma

Users' click-through behavior is considered as a valuable yet noisy source of implicit relevance feedback for web search engines. A series click models have therefore been proposed to extract accurate and unbiased from logs. Previous works shown that users' behaviors in mobile desktop scenarios are rather different many aspects, therefore, the were designed may not be effective context. To address this problem, we propose novel Mobile Click Model (MCM) how users examine results on SERPs....

10.1145/3209978.3210060 article EN 2018-06-27

Investigating Cognitive Effects in Session-level Search User Satisfaction

OPENALEX - Publications

Mengyang Liu Jiaxin Mao Yiqun Liu Min Zhang Shaoping Ma

User satisfaction is an important variable in Web search evaluation studies and has received more attention recent years. Many regard user as the ground truth for designing better metrics. However, most of existing focus on Cranfield-like metrics to reflect at query-level. As information need becomes complex, users often multiple queries multi-round interactions complete a task (e.g. exploratory search). In those cases, how characterize user's during session still remains be investigated....

10.1145/3292500.3330981 article EN 2019-07-25

Understanding Relevance Judgments in Legal Case Retrieval

OPENALEX - Publications

Yunqiu Shao Yueyue Wu Yiqun Liu Jiaxin Mao Shaoping Ma

Legal case retrieval, which aims to retrieve relevant cases given a query case, has drawn increasing research attention in recent years. While much worked on developing automatic retrieval models, how characterize relevance this specialized information (IR) task is still an open question. Towards in-depth understanding of judgments, we conduct laboratory user study that involves 72 participants different domain expertise. In the study, collect score along with detailed explanations for...

10.1145/3569929 article EN ACM transactions on office information systems 2022-10-28

Webformer

OPENALEX - Publications

Yu Guo Zhengyi Ma Jiaxin Mao Hongjin Qian Xinyu Zhang and 3 more

Pre-trained language models (PLMs) have achieved great success in the area of Information Retrieval. Studies show that applying these to ad-hoc document ranking can achieve better retrieval effectiveness. However, on Web, most information is organized form HTML web pages. In addition pure text content, structure content by tags also an important part delivered a page. Currently, such structured totally ignored pre-trained which are trained solely based content. this paper, we propose...

10.1145/3477495.3532086 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2022-07-06

CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models

OPENALEX - Publications

Peiyuan Gong J. Li Jiaxin Mao

10.1145/3626772.3657672 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

Coming Soon ...