- Data Management and Algorithms
- Information Retrieval and Search Behavior
- Rough Sets and Fuzzy Logic
- Web Data Mining and Analysis
- Semantic Web and Ontologies
- Advanced Image and Video Retrieval Techniques
- Topic Modeling
- Data Mining Algorithms and Applications
- Advanced Text Analysis Techniques
- Text and Document Classification Technologies
- Advanced Database Systems and Queries
- Privacy-Preserving Technologies in Data
- Complex Network Analysis Techniques
- Particle Detector Development and Performance
- Cryptography and Data Security
- Machine Learning and Algorithms
- Algorithms and Data Compression
- Spam and Phishing Detection
- Internet Traffic Analysis and Secure E-voting
- Diverse academic and cultural studies
- Image Retrieval and Classification Techniques
- FinTech, Crowdfunding, Digital Finance
- Advanced Clustering Algorithms Research
- Risk and Safety Analysis
- Sharing Economy and Platforms
Politecnico di Milano
2024
Fondazione "Ugo Bordoni"
2009-2023
Sapienza University of Rome
1985-2013
University of Perugia
2012
University of Siena
2012
Fondazione Lombardia per l’Ambiente
2010
University of Naples Federico II
1993
Techniques for automatic query expansion from top retrieved documents have shown promise improving retrieval effectiveness on large collections; however, they often rely an empirical ground, and there is a shortage of cross-system comparisons. Using ideas Information Theory, we present computationally simple theoretically justified method assigning scores to candidate terms. Such are used select weight terms within Rocchio's framework reweigthing. We compare ranking with...
We introduce a probabilistic version of the well-known Rand Index (RI) for measuring similarity between two partitions, called Probabilistic (PRI), in which agreements and disagreements at object-pair level are weighted according to probability their occurring by chance. then cast consensus clustering as an optimization problem PRI value target partition set given experimenting with simple very efficient stochastic algorithm. Remarkable performance gains over input partitions well existing...
Current best-match ranking (BMR) systems perform well but cannot handle word mismatch between a query and document. The best known alternative method, hierarchical clustering-based (HCR), seems to be more robust than BMR with respect this problem, it is hampered by theoretical practical limitations. We present an approach document that explicitly addresses the problem exploiting interdocument similarity information in novel way. Document seen as query-document transformation driven...
In this article we consider methods for automatic query expansion from top retrieved documents (i.e., retrieval feedback) that make use of various functions scoring terms within Rocchio's classical reweighting scheme. An analytical comparison shows the performance based on distinct term-scoring is comparable whole set but differs considerably single queries, consistent with fact ordered sets suggested each by different are largely uncorrelated. Motivated these findings, argue results...
Abstract Web searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list‐based search interface paradigm does not scale well to due their inherent limitations. In this article, we investigate application of results clustering, used with some success for desktop computer searches, scenario. Building on CREDO (Conceptual Reorganization Documents), a clustering engine based concept lattices, present its versions Credino SmartCREDO ,...
By analogy with merging documents rankings, the outputs from multiple search results clustering algorithms can be combined into a single output. In this paper we study feasibility of meta clustering, which has unique features compared to general problem. After showing that combination clusterings is empirically justified, cast as an optimization problem objective function measuring probabilistic concordance between and clusterings. We then show, using easily computable upper bound on such...
Current best-match ranking (BMR) systems perform well but cannot handle word mismatch between a query and document. The best known alternative method, hierarchical clustering-based (HCR), seems to be more robust than BMR with respect this problem, it is hampered by theoretical practical limitations. We present an approach document that explicitly addresses the problem exploiting interdocument similarity information in novel way. Document seen as query-document transformation driven...
When searching for a brand name in search engines, it is very likely to come across websites that sell fake brand's products. In this paper, we study how tackle and measure problem automatically. Our solution consists of pipeline with two learning stages. We first detect the ecommerce (including shopbots) present list results then discriminate between legitimate websites. identify suitable features each stage show through prototype system termed RI.SI.CO. approach feasible, fast, highly...