- Data Management and Algorithms
- Topic Modeling
- Advanced Image and Video Retrieval Techniques
- Recommender Systems and Techniques
- Information Retrieval and Search Behavior
- Human Mobility and Location-Based Analysis
- Web Data Mining and Analysis
- Image Retrieval and Classification Techniques
- Machine Learning and Data Classification
- Machine Learning and Algorithms
- Neural Networks and Applications
- Text and Document Classification Technologies
- Data Mining Algorithms and Applications
- Data Quality and Management
- Domain Adaptation and Few-Shot Learning
- Natural Language Processing Techniques
- Advanced Graph Neural Networks
- Data Stream Mining Techniques
- Algorithms and Data Compression
- Advanced Database Systems and Queries
- Speech and dialogue systems
- Caching and Content Delivery
- Complex Network Analysis Techniques
- Imbalanced Data Classification Techniques
- Explainable Artificial Intelligence (XAI)
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo"
2016-2025
University of Pisa
2024
Institute of Scientific and Technical Information of China
2011-2023
National Research Council
2014-2023
University of Bologna
2001-2023
Lancaster University
2023
University of Sannio
2023
Università degli Studi eCampus
2023
Institute of Informatics and Telematics
2023
Software (Spain)
2023
Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep (up to a 42x speedup on web ranking) making these more practical use real-time scenario. Specifically, we precompute part term representations...
The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem representation-based ranking approach that: (1) explicitly models the importance each term using contextualized language model; (2) performs expansion by propagating to similar terms; and (3) grounds representations lexicon, making them interpretable. Passage can be pre-computed at index time reduce query-time latency. call our EPIC (Expansion via Prediction...
In this paper we propose TripBuilder, a new framework for personalized touristic tour planning. We mine from Flickr the information about actual itineraries followed by multitude of different tourists, and match these on Point Interests available Wikipedia. The task planning tours is then modeled as an instance Generalized Maximum Coverage problem. Wisdom-of-the-crowds allows us to derive plans that maximize measure interest tourist given her preferences visiting time-budget. Experimental...
Learning-to-Rank models based on additive ensembles of regression trees have been proven to be very effective for scoring query results returned by large-scale Web search engines. Unfortunately, the computational cost thousands candidate documents traversing large is high. Thus, several works investigated solutions aimed at improving efficiency document exploiting advanced features modern CPUs and memory hierarchies. In this article, we present Q uick S corer , a new algorithm that adopts...
Learning-to-Rank models based on additive ensembles of regression trees have proven to be very effective for ranking query results returned by Web search engines, a scenario where quality and efficiency requirements are demanding. Unfortunately, the computational cost these is high. Thus, several works already proposed solutions aiming at improving scoring process dealing with features peculiarities modern CPUs memory hierarchies. In this paper, we present QuickScorer, new algorithm that...
In this paper we analyze the efficiency of various search results diversification methods. While efficacy approaches has been deeply investigated in past, response time and scalability issues have rarely addressed. A unified framework for studying performance feasibility result solutions is thus proposed. First define a new methodology detecting when, how, query need to be diversified. To purpose, rely on concept "query refinement" estimate probability ambiguous . Then, relying novel...
Learning to Rank (LtR) is the machine learning method of choice for producing high quality document ranking functions from a ground-truth training examples. In practice, efficiency and effectiveness are intertwined concepts trading off meeting constraints typically existing in large-scale systems one most urgent issues. this paper we propose new framework, named CLEaVER, optimizing machine-learned models based on ensembles regression trees. The goal improve at scoring time without affecting...
Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models relevance and interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over remains challenging. due to the distributional differences between learned term frequency-based lexical such as BM25. Recognizing this challenge, a great deal research has gone into, among other things, designing algorithms...
This monograph takes a step towards promoting the study of efficiency in era neural information retrieval by offering comprehensive survey literature on and effectiveness ranking, to limited extent, retrieval. was inspired parallels that exist between challenges network-based ranking solutions their predecessors, decision forest-based learning rank models, as well connections date has offer. We believe understanding fundamentals underpinning these algorithmic data structure for containing...
In this article, we tackle the problem of predicting “next” geographical position a tourist, given her history (i.e., prediction is done accordingly to tourist’s current trail) by means supervised learning techniques, namely Gradient Boosted Regression Trees and Ranking SVM. The on basis an object space represented 68-dimension feature vector specifically designed for tourism-related data. Furthermore, propose thorough comparison several methods that are considered state-of-the-art in...
Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top- \(k\) retrieval Information Retrieval. This duality exists because serve different end goals. That despite fact that they are manifestations of same mathematical problem. In this work, we ask if algorithms could be applied effectively to vectors, particularly those violate assumptions underlying methods. We study...
In this paper, we tackle the problem of predicting "next" geographical position a tourist given her history (i.e., prediction is done accordingly to tourist's current trail) by means supervised learning techniques, namely Gradient Boosted Regression Trees and Ranking SVM. The on basis an object space represented 68 dimension feature vector, specifically designed for tourism related data. Furthermore, propose thorough comparison several methods that are considered state-of-the-art in...
Scoring documents with learning-to-rank (LtR) models based on large ensembles of regression trees is currently deemed one the best solutions to effectively rank query results be returned by scale Information Retrieval systems. This paper investigates opportunities given SIMD capabilities modern CPUs end efficiently evaluating ensembles. We propose V-QuickScorer (vQS), which exploits extensions vectorize document scoring, i.e., perform ensemble traversal multiple simultaneously. provide a...
In a conversational context, user expresses her multi-faceted information need as sequence of natural-language questions, i.e., utterances. Starting from given topic, the conversation evolves through utterances and system replies. The retrieval documents relevant to utterance in is challenging due ambiguity natural language difficulty detecting possible topic shifts semantic relationships among We adopt 2019 TREC Conversational Assistant Track (CAsT) framework experiment with modular...
Recent studies in Learning to Rank have shown the possibility effectively distill a neural network from an ensemble of regression trees. This result leads networks become natural competitor tree-based ensembles on ranking task. Nevertheless, trees outperform models both terms efficiency and effectiveness, particularly when scoring CPU. In this paper, we propose approach for speeding up time by applying combination Distillation, Pruning Fast Matrix multiplication. We employ knowledge...
Approximate Nearest Neighbors (ANN) search is a crucial task in several applications like recommender systems and information retrieval. Current state-of-the-art ANN libraries, although being performance-oriented, often lack modularity ease of use. This translates into them not fully suitable for easy prototyping testing research ideas, an important feature to enable. We address these limitations by introducing kANNolo, novel research-oriented library written Rust explicitly designed combine...
Learned sparse text embeddings have gained popularity due to their effectiveness in top-k retrieval and inherent interpretability. Their distributional idiosyncrasies, however, long hindered use real-world systems. That changed with the recent development of approximate algorithms that leverage properties speed up retrieval. Nonetheless, much existing literature, evaluation has been limited datasets only a few million documents such as MSMARCO. It remains unclear how these systems behave on...