NFDI4DS | UHH-SEMS - Publication Details

Sebastian Bruch

ORCID: 0000-0002-2469-8242

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5046454671

Research Areas

Advanced Image and Video Retrieval Techniques
Image Retrieval and Classification Techniques
Domain Adaptation and Few-Shot Learning
Machine Learning and Algorithms
Machine Learning and Data Classification
Topic Modeling
Anomaly Detection Techniques and Applications
Information Retrieval and Search Behavior
Text and Document Classification Technologies
Data Management and Algorithms
Adversarial Robustness in Machine Learning
Neural Networks and Applications
Stochastic Gradient Optimization Techniques
Data Mining Algorithms and Applications
Multi-Criteria Decision Making
Optimization and Search Problems
AI-based Problem Solving and Planning
Caching and Content Delivery
Rough Sets and Fuzzy Logic
Algorithms and Data Compression
Explainable Artificial Intelligence (XAI)
Web Data Mining and Analysis
Reinforcement Learning in Robotics
Advanced Graph Neural Networks
Expert finding and Q&A systems

Northeastern University
2025

University of Pisa
2024

Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo"
2024

Pine Technical and Community College
2022-2024

University of Münster
2023

National Institute of Mental Health
2021

National Institutes of Health
2021

Google (United States)
2019-2020

Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks

OPENALEX - Publications

Qingyao Ai Xuanhui Wang Sebastian Bruch Nadav Golbandi Michael Bendersky and 1 more

While in a classification or regression setting label value is assigned to each individual document, ranking we determine the relevance ordering of entire input document list. This difference leads notion relative between documents ranking. The majority existing learning-to-rank algorithms model such relativity at loss level using pairwise listwise functions. However, they are restricted univariate scoring functions, i.e., score computed based on itself, regardless other To overcome this...

10.1145/3341981.3344218 preprint EN 2019-09-26

An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance

OPENALEX - Publications

Sebastian Bruch Xuanhui Wang Michael Bendersky Marc Najork

One of the challenges learning-to-rank for information retrieval is that ranking metrics are not smooth and as such cannot be optimized directly with gradient descent optimization methods. This gap has given rise to a large body research reformulates problem fit into existing machine learning frameworks or defines surrogate, ranking-appropriate loss function. ListNet's which measures cross entropy between distribution over documents obtained from scores another ground-truth labels. was...

10.1145/3341981.3344221 article EN 2019-09-26

Revisiting Approximate Metric Optimization in the Age of Deep Neural Networks

OPENALEX - Publications

Sebastian Bruch Masrour Zoghi Michael Bendersky Marc Najork

Learning-to-Rank is a branch of supervised machine learning that seeks to produce an ordering list items such the utility ranked maximized. Unlike most techniques, however, objective cannot be directly optimized using gradient descent methods as it either discontinuous or flat everywhere. As such, learning-to-rank often optimize loss function loosely related upper-bounds ranking instead. A notable exception approximation framework originally proposed by Qin et al. facilitates more direct...

10.1145/3331184.3331347 article EN 2019-07-18

TF-Ranking

OPENALEX - Publications

Rama Kumar Pasumarthi Sebastian Bruch Xuanhui Wang Cheng Li Michael Bendersky and 5 more

Learning-to-Rank deals with maximizing the utility of a list examples presented to user, items higher relevance being prioritized. It has several practical applications such as large-scale search, recommender systems, document summarization and question answering. While there is widespread support for classification regression based learning, learning-to-rank in deep learning been limited. We introduce TensorFlow Ranking, first open source library solving ranking problems framework. highly...

10.1145/3292500.3330677 preprint EN 2019-07-25

A Stochastic Treatment of Learning to Rank Scoring Functions

OPENALEX - Publications

Sebastian Bruch Shuguang Han Michael Bendersky Marc Najork

Learning to Rank, a central problem in information retrieval, is class of machine learning algorithms that formulate ranking as an optimization task. The objective learn function produces ordering set documents such way the utility entire ordered list maximized. Learning-to-rank methods do so by computes score for each document set. A ranked then compiled sorting according their scores. While deterministic mapping scores permutations makes sense during inference where stability lists...

10.1145/3336191.3371844 article EN 2020-01-20

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

OPENALEX - Publications

Sebastian Bruch Franco Maria Nardini Cosimo Rulli Rossano Venturini

Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models relevance and interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over remains challenging. due to the distributional differences between learned term frequency-based lexical such as BM25. Recognizing this challenge, a great deal research has gone into, among other things, designing algorithms...

10.1145/3626772.3657769 preprint EN arXiv (Cornell University) 2024-04-29

Efficient and Effective Tree-based and Neural Learning to Rank

OPENALEX - Publications

Sebastian Bruch Claudio Lucchese Franco Maria Nardini

This monograph takes a step towards promoting the study of efficiency in era neural information retrieval by offering comprehensive survey literature on and effectiveness ranking, to limited extent, retrieval. was inspired parallels that exist between challenges network-based ranking solutions their predecessors, decision forest-based learning rank models, as well connections date has offer. We believe understanding fundamentals underpinning these algorithmic data structure for containing...

10.1561/1500000071 article EN Foundations and Trends® in Information Retrieval 2023-01-01

An Analysis of Fusion Functions for Hybrid Retrieval

OPENALEX - Publications

Sebastian Bruch Siyu Gai Amir Ingber

We study hybrid search in text retrieval where lexical and semantic are fused together with the intuition that two complementary how they model relevance. In particular, we examine fusion by a convex combination (CC) of scores, as well Reciprocal Rank Fusion (RRF) method, identify their advantages potential pitfalls. Contrary to existing studies, find RRF be sensitive its parameters; learning CC is generally agnostic choice score normalization; outperforms in-domain out-of-domain settings;...

10.1145/3596512 article EN ACM transactions on office information systems 2023-05-20

Bridging Dense and Sparse Maximum Inner Product Search

OPENALEX - Publications

Sebastian Bruch Franco Maria Nardini Amir Ingber Edo Liberty

Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top- $k$ retrieval Information Retrieval. This duality exists because serve different end goals. That despite fact that they are manifestations of same mathematical problem. In this work, we ask if algorithms could be applied effectively to vectors, particularly those violate assumptions underlying methods. We study...

10.1145/3665324 article EN ACM transactions on office information systems 2024-08-19

Investigating the Scalability of Approximate Sparse Retrieval Algorithms to Massive Datasets

OPENALEX - Publications

Sebastian Bruch Franco Maria Nardini Cosimo Rulli Rossano Venturini Leonardo Venuta

Learned sparse text embeddings have gained popularity due to their effectiveness in top-k retrieval and inherent interpretability. Their distributional idiosyncrasies, however, long hindered use real-world systems. That changed with the recent development of approximate algorithms that leverage properties speed up retrieval. Nonetheless, much existing literature, evaluation has been limited datasets only a few million documents such as MSMARCO. It remains unclear how these systems behave on...

10.48550/arxiv.2501.11628 preprint EN arXiv (Cornell University) 2025-01-20

Advances in Vector Search

OPENALEX - Publications

Sebastian Bruch

10.1145/3701551.3703482 article EN 2025-02-26

An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors

OPENALEX - Publications

Sebastian Bruch Franco Maria Nardini Amir Ingber Edo Liberty

Maximum Inner Product Search or top- k retrieval on sparse vectors is well understood in information retrieval, with a number of mature algorithms that solve it exactly. However, all existing are tailored to text and frequency-based similarity measures. To achieve optimal memory footprint query latency, they rely the near stationarity documents laws governing natural languages. We consider, instead, setup which collections streaming—necessitating dynamic indexing—and where indexing must work...

10.1145/3609797 article EN ACM transactions on office information systems 2023-07-17

A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor Search

OPENALEX - Publications

Thomas Vecchiato Claudio Lucchese Franco Maria Nardini Sebastian Bruch

A critical piece of the modern information retrieval puzzle is approximate nearest neighbor search.Its objective to return a set data points that are closest query point, with its accuracy measured by proportion exact neighbors captured in returned set.One popular approach this question clustering: The indexing algorithm partitions into non-overlapping subsets and represents each partition point such as centroid.The processing first identifies clusters-a process known routing-then performs...

10.1145/3626772.3657931 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library

OPENALEX - Publications

Mathieu Guillame-Bert Sebastian Bruch Richard Stotz Jan Pfeifer

Yggdrasil Decision Forests is a library for the training, serving and interpretation of decision forest models, targeted both at research production work, implemented in C++, available command line interface, Python (under name TensorFlow Forests), JavaScript, Go, Google Sheets Simple ML Sheets). The has been developed organically since 2018 following set four design principles applicable to machine learning libraries frameworks: simplicity use, safety modularity high-level abstraction,...

10.1145/3580305.3599933 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

ReNeuIR: Reaching Efficiency in Neural Information Retrieval

OPENALEX - Publications

Sebastian Bruch Claudio Lucchese Franco Maria Nardini

Perhaps the applied nature of information retrieval research goes some way to explain community's rich history evaluating machine learning models holistically, understanding that efficacy matters but so does computational cost incurred achieve it. This is evidenced, for example, by more than a decade on efficient training and inference large decision forest in learning-to-rank. As community adopts even complex, neural network-based wide range applications, questions efficiency have once...

10.1145/3477495.3531704 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2022-07-06

Learning to Rank in Theory and Practice

OPENALEX - Publications

Claudio Lucchese Franco Maria Nardini Rama Kumar Pasumarthi Sebastian Bruch Michael Bendersky and 4 more

This tutorial aims to weave together diverse strands of modern Learning Rank (LtR) research, and present them in a unified full-day tutorial. First, we will introduce the fundamentals LtR, an overview its various sub-fields. Then, discuss some recent advances gradient boosting methods such as LambdaMART by focusing on their efficiency/effectiveness trade-offs optimizations. Subsequently, then TF-Ranking, new open source TensorFlow package for neural LtR models, how it can be used modeling...

10.1145/3331184.3334824 article EN 2019-07-18

An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors

OPENALEX - Publications

Sebastian Bruch Franco Maria Nardini Amir Ingber Edo Liberty

Maximum Inner Product Search or top-k retrieval on sparse vectors is well-understood in information retrieval, with a number of mature algorithms that solve it exactly. However, all existing are tailored to text and frequency-based similarity measures. To achieve optimal memory footprint query latency, they rely the near stationarity documents laws governing natural languages. We consider, instead, setup which collections streaming -- necessitating dynamic indexing where must work...

10.48550/arxiv.2301.10622 preprint EN other-oa arXiv (Cornell University) 2023-01-01

ReNeuIR at SIGIR 2023: The Second Workshop on Reaching Efficiency in Neural Information Retrieval

OPENALEX - Publications

Sebastian Bruch Joel Mackenzie Maria Maistro Franco Maria Nardini

Multifaceted, empirical evaluation of algorithmic ideas is one the central pillars Information Retrieval (IR) research. The IR community has a rich history studying effectiveness indexes, retrieval algorithms, and complex machine learning rankers and, at same time, quantifying their computational costs, from creation training to application inference. As moves towards even more deep models, questions on efficiency have once again become relevant with renewed urgency. Indeed, no longer...

10.1145/3539618.3591922 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18

An Alternative Cross Entropy Loss for Learning-to-Rank

OPENALEX - Publications

Sebastian Bruch

Listwise learning-to-rank methods form a powerful class of ranking algorithms that are widely adopted in applications such as information retrieval. These learn to rank set items by optimizing loss is function the entire set—as surrogate typically non-differentiable metric. Despite their empirical success, existing listwise based on heuristics and remain theoretically ill-understood. In particular, none empirically successful functions related metrics. this work, we propose cross...

10.1145/3442381.3449794 preprint EN 2021-04-19

Optimistic Query Routing in Clustering-based Approximate Maximum Inner Product Search

OPENALEX - Publications

Sebastian Bruch Aditya Krishnan Franco Maria Nardini

Clustering-based nearest neighbor search is a simple yet effective method in which data points are partitioned into geometric shards to form an index, and only few searched during query processing find approximate set of top-$k$ vectors. Even though the efficacy heavily influenced by algorithm that identifies probe, it has received little attention literature. This work attempts bridge gap studying problem routing clustering-based maximum inner product (MIPS). We begin unpacking existing...

10.48550/arxiv.2405.12207 preprint EN arXiv (Cornell University) 2024-05-20

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

OPENALEX - Publications

Sebastian Bruch Franco Maria Nardini Cosimo Rulli Rossano Venturini

Learned sparse representations form an attractive class of contextual embeddings for text retrieval.That is so because they are effective models relevance and interpretable by design.Despite their apparent compatibility with inverted indexes, however, retrieval over remains challenging.That due to the distributional differences between learned term frequency-based lexical such as BM25.Recognizing this challenge, a great deal research has gone into, among other things, designing algorithms...

10.1145/3626772.3657769 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

Coming Soon ...