NFDI4DS | UHH-SEMS - Publication Details

Xuanhui Wang

ORCID: 0009-0000-1388-1423

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5064608039

Research Areas

Information Retrieval and Search Behavior
Topic Modeling
Web Data Mining and Analysis
Recommender Systems and Techniques
Text and Document Classification Technologies
Domain Adaptation and Few-Shot Learning
Natural Language Processing Techniques
Machine Learning and Algorithms
Data Management and Algorithms
Machine Learning and Data Classification
Advanced Image and Video Retrieval Techniques
Advanced Bandit Algorithms Research
Expert finding and Q&A systems
Explainable Artificial Intelligence (XAI)
Image Retrieval and Classification Techniques
Mobile Crowdsensing and Crowdsourcing
Face and Expression Recognition
Multimodal Machine Learning Applications
Data Mining Algorithms and Applications
Imbalanced Data Classification Techniques
Personal Information Management and User Behavior
Data Quality and Management
Optimization and Search Problems
Complex Network Analysis Techniques
Neural Networks and Applications

The First Affiliated Hospital, Sun Yat-sen University
2024-2025

China University of Mining and Technology
2020-2025

China Coal Research Institute (China)
2025

China Coal Technology and Engineering Group Corp (China)
2025

Sun Yat-sen University
2024-2025

Google (United States)
2016-2024

University of Waterloo
2023-2024

University of Massachusetts Amherst
2023

Qingdao Agricultural University
2018-2022

Meta (United States)
2012-2013

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

OPENALEX - Publications

Lihong Li Wei Chu John Langford Xuanhui Wang

Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news in general. \emph{Offline} evaluation of the effectiveness new these applications is critical protecting user experiences but very challenging due to their "partial-label" nature. Common practice create a simulator which simulates environment problem at hand then run an algorithm against this simulator. However, creating itself often difficult modeling bias usually...

10.1145/1935826.1935878 preprint EN 2011-02-01

Learning to Rank with Selection Bias in Personal Search

OPENALEX - Publications

Xuanhui Wang Michael Bendersky Donald Metzler Marc Najork

Click-through data has proven to be a critical resource for improving search ranking quality. Though large amount of click can easily collected by engines, various biases make it difficult fully leverage this type data. In the past, many models have been proposed and successfully used estimate relevance individual query-document pairs in context web search. These typically require quantity clicks each pair makes them apply systems where is highly sparse due personalized corpora information...

10.1145/2911451.2911537 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2016-07-07

Position Bias Estimation for Unbiased Learning to Rank in Personal Search

OPENALEX - Publications

Xuanhui Wang Nadav Golbandi Michael Bendersky Donald Metzler Marc Najork

A well-known challenge in learning from click data is its inherent bias and most notably position bias. Traditional models aim to extract the ‹query, document› relevance estimated usually discarded after extracted. In contrast, recent work on unbiased learning-to-rank can effectively leverage thus focuses estimating rather than [20, 31]. Existing approaches use search result randomization over a small percentage of production traffic estimate This not desired because negatively impact users'...

10.1145/3159652.3159732 article EN 2018-02-02

Bug characteristics in open source software

OPENALEX - Publications

Lin Tan Chen Liu LI Zhen-min Xuanhui Wang Yuanyuan Zhou and 1 more

10.1007/s10664-013-9258-8 article EN Empirical Software Engineering 2013-06-06

Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting

OPENALEX - Publications

Zhen Qin Rolf Jagerman Kai Hui Honglei Zhuang Junru Wu and 7 more

10.18653/v1/2024.findings-naacl.97 article EN Findings of the Association for Computational Linguistics: NAACL 2022 2024-01-01

Have things changed now?

OPENALEX - Publications

Zhenmin Li Lin Tan Xuanhui Wang Shan Lu Yuanyuan Zhou and 1 more

Software errors are a major cause for system failures. To effectively design tools and support detecting recovering from software failures requires deep understanding of bug characteristics. Recently, its development process have significantly changed in many ways, including more help detection tools, shift towards multi-threading architecture, the open-source paradigm increasing concerns about security user-friendly interface. Therefore, results previous studies may not be applicable to...

10.1145/1181309.1181314 article EN 2006-10-21

Mining correlated bursty topic patterns from coordinated text streams

OPENALEX - Publications

Xuanhui Wang ChengXiang Zhai Xiao Hu Richard Sproat

Previous work on text mining has almost exclusively focused a single stream. However, we often have available multiple streams indexed by the same set of time points (called coordinated streams), which offer new opportunities for mining. For example, when major event happens, all news articles published different agencies in languages tend to cover certain period, exhibiting correlated bursty topic pattern article streams. In general, patterns from can reveal interesting latent associations...

10.1145/1281192.1281276 article EN 2007-08-12

Learn from web search logs to organize search results

OPENALEX - Publications

Xuanhui Wang ChengXiang Zhai

Effective organization of search results is critical for improving the utility any engine. Clustering an effective way to organize results, which allows a user navigate into relevant documents quickly. However, two deficiencies this approach make it not always work well: (1) clusters discovered do necessarily correspond interesting aspects topic from user's perspective; and (2) cluster labels generated are informative enough allow identify right cluster. In paper, we propose address these by...

10.1145/1277741.1277759 article EN 2007-07-23

Probabilistic dyadic data analysis with local and global consistency

OPENALEX - Publications

Deng Cai Xuanhui Wang Xiaofei He

Dyadic data arises in many real world applications such as social network analysis and information retrieval. In order to discover the underlying or hidden structure dyadic data, topic modeling techniques were proposed. The typical algorithms include Probabilistic Latent Semantic Analysis (PLSA) Dirichlet Allocation (LDA). probability density functions obtained by both of these two are supported on Euclidean space. However, previous studies have shown naturally occurring may reside close an...

10.1145/1553374.1553388 article EN 2009-06-14

The LambdaLoss Framework for Ranking Metric Optimization

OPENALEX - Publications

Xuanhui Wang Cheng Li Nadav Golbandi Michael Bendersky Marc Najork

How to optimize ranking metrics such as Normalized Discounted Cumulative Gain (NDCG) is an important but challenging problem, because are either flat or discontinuous everywhere, which makes them hard be optimized directly. Among existing approaches, LambdaRank a novel algorithm that incorporates into its learning procedure. Though empirically effective, it still lacks theoretical justification. For example, the underlying loss optimizes for remains unknown until now. Due this, there no...

10.1145/3269206.3271784 article EN 2018-10-17

Short-term prediction of groundwater level using improved random forest regression with a combination of random features

OPENALEX - Publications

Xuanhui Wang Tailian Liu Xilai Zheng Hui Peng Jia Xin and 1 more

To solve the problem where by available on-site input data are too scarce to predict level of groundwater, this paper proposes an algorithm make prediction called canonical correlation forest with a combination random features. assess effectiveness proposed algorithm, groundwater levels and meteorological for Daguhe River source field, in Qingdao, China, were used. First, results comparison among three regressors showed that is superior terms forecasting variations level. Second, experiments...

10.1007/s13201-018-0742-6 article EN cc-by Applied Water Science 2018-07-24

Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks

OPENALEX - Publications

Qingyao Ai Xuanhui Wang Sebastian Bruch Nadav Golbandi Michael Bendersky and 1 more

While in a classification or regression setting label value is assigned to each individual document, ranking we determine the relevance ordering of entire input document list. This difference leads notion relative between documents ranking. The majority existing learning-to-rank algorithms model such relativity at loss level using pairwise listwise functions. However, they are restricted univariate scoring functions, i.e., score computed based on itself, regardless other To overcome this...

10.1145/3341981.3344218 preprint EN 2019-09-26

Language model information retrieval with document expansion

OPENALEX - Publications

Tao Tao Xuanhui Wang Qiaozhu Mei ChengXiang Zhai

Language model information retrieval depends on accurate estimation of document models. In this paper, we propose a expansion technique to deal with the problem insufficient sampling documents. We construct probabilistic neighborhood for each document, and expand its information. The expanded provides more model, thus improves accuracy. Moreover, since pseudo feedback exploit different corpus structures, they can be combined further improve performance. experiment results several data sets...

10.3115/1220835.1220887 article EN 2006-01-01

A study of methods for negative relevance feedback

OPENALEX - Publications

Xuanhui Wang Hui Fang ChengXiang Zhai

Negative relevance feedback is a special case of where we do not have any positive example; this often happens when the topic difficult and search results are poor. Although in principle standard technique can be applied to negative feedback, it may perform well due lack examples. In paper, conduct systematic study methods for feedback. We compare set representative methods, covering vector-space models language models, as several heuristics Evaluating requires test with sufficient topics,...

10.1145/1390334.1390374 article EN 2008-07-20

Mining term association patterns from search logs for effective query reformulation

OPENALEX - Publications

Xuanhui Wang ChengXiang Zhai

Search engine logs are an emerging new type of data that offers interesting opportunities for mining. Existing work on mining such has mostly attempted to discover knowledge at the level queries (e.g., query clusters). In this paper, we propose mine search patterns terms through analyzing relations inside a query. We define two novel term association (i.e., context-sensitive substitutions and additions) methods from logs. These can be used address mis-specification under-specification...

10.1145/1458082.1458147 article EN 2008-10-26

Learning to model relatedness for news recommendation

OPENALEX - Publications

Yuanhua Lv Taesup Moon Pranam Kolari Zhaohui Zheng Xuanhui Wang and 1 more

With the explosive growth of online news readership, recommending interesting articles to users has become extremely important. While existing Web services such as Yahoo! and Digg attract users' initial clicks by leveraging various kinds signals, how engage algorithmically after their visit is largely under-explored. In this paper, we study problem post-click recommendation. Given that a user perused current article, our idea automatically identify "related" which would like read afterwards....

10.1145/1963405.1963417 article EN 2011-03-28

An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance

OPENALEX - Publications

Sebastian Bruch Xuanhui Wang Michael Bendersky Marc Najork

One of the challenges learning-to-rank for information retrieval is that ranking metrics are not smooth and as such cannot be optimized directly with gradient descent optimization methods. This gap has given rise to a large body research reformulates problem fit into existing machine learning frameworks or defines surrogate, ranking-appropriate loss function. ListNet's which measures cross entropy between distribution over documents obtained from scores another ground-truth labels. was...

10.1145/3341981.3344221 article EN 2019-09-26

Learning-to-Rank with BERT in TF-Ranking

OPENALEX - Publications

Shuguang Han Xuanhui Wang Mike Bendersky Marc Najork

This paper describes a machine learning algorithm for document (re)ranking, in which queries and documents are firstly encoded using BERT [1], on top of that learning-to-rank (LTR) model constructed with TF-Ranking (TFR) [2] is applied to further optimize the ranking performance. approach proved be effective public MS MARCO benchmark [3]. Our first two submissions achieve best performance passage re-ranking task [4], second full-ranking as April 10, 2020 [5]. To leverage lately development...

10.48550/arxiv.2004.08476 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Estimating Position Bias without Intrusive Interventions

OPENALEX - Publications

Aman Agarwal Ivan Zaitsev Xuanhui Wang Cheng Li Marc Najork and 1 more

Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds relevance signal. While was recently shown how counterfactual learning-to-rank (LTR) approaches \citeJoachims/etal/17a can provably overcome presentation observation propensities are known, remains to show effectively estimate these propensities. In this paper, we propose first method for producing consistent propensity estimates without manual judgments, disruptive...

10.1145/3289600.3291017 preprint EN 2019-01-30

Situational Context for Ranking in Personal Search

OPENALEX - Publications

Hamed Zamani Michael Bendersky Xuanhui Wang Mingyang Zhang

Modern search engines leverage a variety of sources, beyond the conventional query-document content similarity, to improve their ranking performance. Among them, query context has attracted attention in prior work. Previously, was mainly modeled by user history, either long-term or short-term, help future queries. In this paper, we focus on situational context, i.e., contextual features current request that are independent from both and history. As an example, can depend time location. We...

10.1145/3038912.3052648 article EN 2017-04-03

Addressing Trust Bias for Unbiased Learning-to-Rank

OPENALEX - Publications

Aman Agarwal Xuanhui Wang Cheng Li Michael Bendersky Marc Najork

Existing unbiased learning-to-rank models use counterfactual inference, notably Inverse Propensity Scoring (IPS), to learn a ranking function from biased click data. They handle the incompleteness bias, but usually assume that clicks are noise-free, i.e., clicked document is always assumed be relevant. In this paper, we relax unrealistic assumption and study noise explicitly in setting. Specifically, model as position-dependent trust bias propose noise-aware Position-Based Model, named...

10.1145/3308558.3313697 article EN 2019-05-13

TF-Ranking

OPENALEX - Publications

Rama Kumar Pasumarthi Sebastian Bruch Xuanhui Wang Cheng Li Michael Bendersky and 5 more

Learning-to-Rank deals with maximizing the utility of a list examples presented to user, items higher relevance being prioritized. It has several practical applications such as large-scale search, recommender systems, document summarization and question answering. While there is widespread support for classification regression based learning, learning-to-rank in deep learning been limited. We introduce TensorFlow Ranking, first open source library solving ranking problems framework. highly...

10.1145/3292500.3330677 preprint EN 2019-07-25

An improved Faster R-CNN model for multi-object tomato maturity detection in complex scenarios

OPENALEX - Publications

Zan Wang Yi-Ming Ling Xuanli Wang Dezhang Meng Lixiu Nie and 2 more

10.1016/j.ecoinf.2022.101886 article EN Ecological Informatics 2022-10-27

RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses

OPENALEX - Publications

Honglei Zhuang Zhen Qin Rolf Jagerman Kai Hui Ji Ma and 4 more

Pretrained language models such as BERT have been shown to be exceptionally effective for text ranking. However, there are limited studies on how leverage more powerful sequence-to-sequence T5. Existing attempts usually formulate ranking a classification problem and rely postprocessing obtain ranked list. In this paper, we propose RankT5 study two T5-based model structures, an encoder-decoder encoder-only one, so that they not only can directly output scores each query-document pair, but...

10.1145/3539618.3592047 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18

Coming Soon ...