- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Domain Adaptation and Few-Shot Learning
- Machine Learning and Algorithms
- Machine Learning and Data Classification
- Topic Modeling
- Anomaly Detection Techniques and Applications
- Information Retrieval and Search Behavior
- Text and Document Classification Technologies
- Data Management and Algorithms
- Adversarial Robustness in Machine Learning
- Neural Networks and Applications
- Stochastic Gradient Optimization Techniques
- Data Mining Algorithms and Applications
- Multi-Criteria Decision Making
- Optimization and Search Problems
- AI-based Problem Solving and Planning
- Caching and Content Delivery
- Rough Sets and Fuzzy Logic
- Algorithms and Data Compression
- Explainable Artificial Intelligence (XAI)
- Web Data Mining and Analysis
- Reinforcement Learning in Robotics
- Advanced Graph Neural Networks
- Expert finding and Q&A systems
Northeastern University
2025
University of Pisa
2024
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo"
2024
Pine Technical and Community College
2022-2024
University of Münster
2023
National Institute of Mental Health
2021
National Institutes of Health
2021
Google (United States)
2019-2020
While in a classification or regression setting label value is assigned to each individual document, ranking we determine the relevance ordering of entire input document list. This difference leads notion relative between documents ranking. The majority existing learning-to-rank algorithms model such relativity at loss level using pairwise listwise functions. However, they are restricted univariate scoring functions, i.e., score computed based on itself, regardless other To overcome this...
One of the challenges learning-to-rank for information retrieval is that ranking metrics are not smooth and as such cannot be optimized directly with gradient descent optimization methods. This gap has given rise to a large body research reformulates problem fit into existing machine learning frameworks or defines surrogate, ranking-appropriate loss function. ListNet's which measures cross entropy between distribution over documents obtained from scores another ground-truth labels. was...
Learning-to-Rank is a branch of supervised machine learning that seeks to produce an ordering list items such the utility ranked maximized. Unlike most techniques, however, objective cannot be directly optimized using gradient descent methods as it either discontinuous or flat everywhere. As such, learning-to-rank often optimize loss function loosely related upper-bounds ranking instead. A notable exception approximation framework originally proposed by Qin et al. facilitates more direct...
Learning-to-Rank deals with maximizing the utility of a list examples presented to user, items higher relevance being prioritized. It has several practical applications such as large-scale search, recommender systems, document summarization and question answering. While there is widespread support for classification regression based learning, learning-to-rank in deep learning been limited. We introduce TensorFlow Ranking, first open source library solving ranking problems framework. highly...
Learning to Rank, a central problem in information retrieval, is class of machine learning algorithms that formulate ranking as an optimization task. The objective learn function produces ordering set documents such way the utility entire ordered list maximized. Learning-to-rank methods do so by computes score for each document set. A ranked then compiled sorting according their scores. While deterministic mapping scores permutations makes sense during inference where stability lists...
Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models relevance and interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over remains challenging. due to the distributional differences between learned term frequency-based lexical such as BM25. Recognizing this challenge, a great deal research has gone into, among other things, designing algorithms...
This monograph takes a step towards promoting the study of efficiency in era neural information retrieval by offering comprehensive survey literature on and effectiveness ranking, to limited extent, retrieval. was inspired parallels that exist between challenges network-based ranking solutions their predecessors, decision forest-based learning rank models, as well connections date has offer. We believe understanding fundamentals underpinning these algorithmic data structure for containing...
We study hybrid search in text retrieval where lexical and semantic are fused together with the intuition that two complementary how they model relevance. In particular, we examine fusion by a convex combination (CC) of scores, as well Reciprocal Rank Fusion (RRF) method, identify their advantages potential pitfalls. Contrary to existing studies, find RRF be sensitive its parameters; learning CC is generally agnostic choice score normalization; outperforms in-domain out-of-domain settings;...
Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top- \(k\) retrieval Information Retrieval. This duality exists because serve different end goals. That despite fact that they are manifestations of same mathematical problem. In this work, we ask if algorithms could be applied effectively to vectors, particularly those violate assumptions underlying methods. We study...
Learned sparse text embeddings have gained popularity due to their effectiveness in top-k retrieval and inherent interpretability. Their distributional idiosyncrasies, however, long hindered use real-world systems. That changed with the recent development of approximate algorithms that leverage properties speed up retrieval. Nonetheless, much existing literature, evaluation has been limited datasets only a few million documents such as MSMARCO. It remains unclear how these systems behave on...
Maximum Inner Product Search or top- k retrieval on sparse vectors is well understood in information retrieval, with a number of mature algorithms that solve it exactly. However, all existing are tailored to text and frequency-based similarity measures. To achieve optimal memory footprint query latency, they rely the near stationarity documents laws governing natural languages. We consider, instead, setup which collections streaming—necessitating dynamic indexing—and where indexing must work...
A critical piece of the modern information retrieval puzzle is approximate nearest neighbor search.Its objective to return a set data points that are closest query point, with its accuracy measured by proportion exact neighbors captured in returned set.One popular approach this question clustering: The indexing algorithm partitions into non-overlapping subsets and represents each partition point such as centroid.The processing first identifies clusters-a process known routing-then performs...
Yggdrasil Decision Forests is a library for the training, serving and interpretation of decision forest models, targeted both at research production work, implemented in C++, available command line interface, Python (under name TensorFlow Forests), JavaScript, Go, Google Sheets Simple ML Sheets). The has been developed organically since 2018 following set four design principles applicable to machine learning libraries frameworks: simplicity use, safety modularity high-level abstraction,...
Perhaps the applied nature of information retrieval research goes some way to explain community's rich history evaluating machine learning models holistically, understanding that efficacy matters but so does computational cost incurred achieve it. This is evidenced, for example, by more than a decade on efficient training and inference large decision forest in learning-to-rank. As community adopts even complex, neural network-based wide range applications, questions efficiency have once...
This tutorial aims to weave together diverse strands of modern Learning Rank (LtR) research, and present them in a unified full-day tutorial. First, we will introduce the fundamentals LtR, an overview its various sub-fields. Then, discuss some recent advances gradient boosting methods such as LambdaMART by focusing on their efficiency/effectiveness trade-offs optimizations. Subsequently, then TF-Ranking, new open source TensorFlow package for neural LtR models, how it can be used modeling...
Maximum Inner Product Search or top-k retrieval on sparse vectors is well-understood in information retrieval, with a number of mature algorithms that solve it exactly. However, all existing are tailored to text and frequency-based similarity measures. To achieve optimal memory footprint query latency, they rely the near stationarity documents laws governing natural languages. We consider, instead, setup which collections streaming -- necessitating dynamic indexing where must work...
Multifaceted, empirical evaluation of algorithmic ideas is one the central pillars Information Retrieval (IR) research. The IR community has a rich history studying effectiveness indexes, retrieval algorithms, and complex machine learning rankers and, at same time, quantifying their computational costs, from creation training to application inference. As moves towards even more deep models, questions on efficiency have once again become relevant with renewed urgency. Indeed, no longer...
Listwise learning-to-rank methods form a powerful class of ranking algorithms that are widely adopted in applications such as information retrieval. These learn to rank set items by optimizing loss is function the entire set—as surrogate typically non-differentiable metric. Despite their empirical success, existing listwise based on heuristics and remain theoretically ill-understood. In particular, none empirically successful functions related metrics. this work, we propose cross...
Clustering-based nearest neighbor search is a simple yet effective method in which data points are partitioned into geometric shards to form an index, and only few searched during query processing find approximate set of top-$k$ vectors. Even though the efficacy heavily influenced by algorithm that identifies probe, it has received little attention literature. This work attempts bridge gap studying problem routing clustering-based maximum inner product (MIPS). We begin unpacking existing...
Learned sparse representations form an attractive class of contextual embeddings for text retrieval.That is so because they are effective models relevance and interpretable by design.Despite their apparent compatibility with inverted indexes, however, retrieval over remains challenging.That due to the distributional differences between learned term frequency-based lexical such as BM25.Recognizing this challenge, a great deal research has gone into, among other things, designing algorithms...