Matteo Riondato

ORCID: 0000-0003-2523-4420
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Mining Algorithms and Applications
  • Rough Sets and Fuzzy Logic
  • Complex Network Analysis Techniques
  • Data Management and Algorithms
  • Advanced Database Systems and Queries
  • Imbalanced Data Classification Techniques
  • Advanced Graph Neural Networks
  • Machine Learning and Algorithms
  • Data Quality and Management
  • Algorithms and Data Compression
  • Graph Theory and Algorithms
  • Opinion Dynamics and Social Influence
  • Data Stream Mining Techniques
  • Complexity and Algorithms in Graphs
  • Social Media and Politics
  • Network Security and Intrusion Detection
  • Markov Chains and Monte Carlo Methods
  • Advanced Clustering Algorithms Research
  • Neural Networks and Applications
  • DNA and Biological Computing
  • Optimization and Variational Analysis
  • Topological and Geometric Data Analysis
  • Spam and Phishing Detection
  • Advanced Statistical Process Monitoring
  • Internet Traffic Analysis and Secure E-voting

Amherst College
2019-2025

Two Sigma Investments (United States)
2014-2018

Sigma Labs (United States)
2018

Brown University
2010-2015

Stanford University
2015

John Brown University
2011-2014

Accurate query performance prediction (QPP) is central to effective resource management, optimization and scheduling. Analytical cost models, used in current generation of optimizers, have been successful comparing the costs alternative plans, but they are poor predictors execution latency. As a more promising approach QPP, this paper studies practicality utility sophisticated learning-based which recently applied variety predictive tasks with great success, both static (i.e., fixed) dynamic...

10.1109/icde.2012.64 article EN 2012-04-01

10.1007/s10618-015-0423-0 article EN Data Mining and Knowledge Discovery 2015-06-29

Frequent Itemsets and Association Rules Mining (FIM) is a key task in knowledge discovery from data. As the dataset grows, cost of solving this dominated by component that depends on number transactions dataset. We address issue proposing PARMA, parallel algorithm for MapReduce framework, which scales well with size (as transactions) while minimizing data replication communication cost. PARMA cuts down dataset-size-dependent part using random sampling approach to FIM. Each machine mines...

10.1145/2396761.2396776 article EN 2012-10-29

We present TRIEST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations the global and local (i.e., incident each vertex) number triangles in fully-dynamic graph represented as an adversarial stream edge insertions deletions.

10.1145/2939672.2939771 article EN 2016-08-08

Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices terms fraction shortest paths that pass through them. Exact computation large networks prohibitively expensive and fast approximation algorithms are required these cases. We present two efficient randomized for betweenness estimation. The based on random sampling offer probabilistic guarantees quality approximation. first algorithm estimates all vertices:...

10.1145/2556195.2556224 article EN 2014-02-18

ABPA Ξ A Σ ( ABRAXAS ): Gnostic word of mystic meaning . We present ABRA, a suite algorithms to compute and maintain probabilistically guaranteed high-quality approximations the betweenness centrality all nodes (or edges) on both static fully dynamic graphs. Our use progressive random sampling their analysis rely Rademacher averages pseudodimension, fundamental concepts from statistical learning theory. To our knowledge, ABRA is first application these field graph analysis. experimental...

10.1145/3208351 article EN ACM Transactions on Knowledge Discovery from Data 2018-07-20

10.1007/s10618-016-0468-8 article EN Data Mining and Knowledge Discovery 2016-06-06

This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for which captures functional flavor of paradigm by allowing flexible use parallelism. Indeed, diverges from traditional processor-centric view featuring parameters embody only global local memory constraints, thus favoring more data-centric view. Second, apply to computation task matrix multiplication presenting upper lower...

10.1145/2304576.2304607 preprint EN 2012-06-25

“Ogni lassada xe persa.” 1 -- Proverb from Trieste, Italy. We present trièst , a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations the global and local (i.e., incident each vertex) number triangles in fully dynamic graph represented as an adversarial stream edge insertions deletions. Our use reservoir sampling its variants exploit user-specified memory space at all times. This is contrast with previous approaches, which require...

10.1145/3059194 article EN ACM Transactions on Knowledge Discovery from Data 2017-06-29

We present an algorithm to extract high-quality approximation of the (top-k) Frequent itemsets (FIs) from random samples a transactional dataset. With high probability is superset FIs, and no itemset with frequency much lower than threshold included in it. The employs progressive sampling, stopping condition based on bounds empirical Rademacher average, key concept statistical learning theory. computation uses characteristic quantities that can be obtained efficiently single scan sample....

10.1145/2783258.2783265 article EN 2015-08-07

The tasks of extracting (top- K ) Frequent Itemsets (FIs) and Association Rules (ARs) are fundamental primitives in data mining database applications. Exact algorithms for these problems exist widely used, but their running time is hindered by the need scanning entire dataset, possibly multiple times. High-quality approximations FIs ARs sufficient most practical uses. Sampling techniques can be used fast discovery approximate solutions, works exploring this technique did not provide...

10.1145/2629586 article EN ACM Transactions on Knowledge Discovery from Data 2014-08-29

We introduce Polaris, a network null model for colored multigraphs that preserves the Joint Color Matrix.Polaris is specifically designed studying polarization, where vertices belong to side in debate or partisan group, represented by vertex color, and relations have different strengths, an integer-valued edge multiplicity.The key feature of Polaris preserving Matrix (JCM) multigraph, which specifies number edges connecting any two given colors.The JCM basic property determines color...

10.1145/3701551.3703560 article EN 2025-02-26

We study the problem of graph summarization. Given a large we aim at producing concise lossy representation that can be stored in main memory and used to approximately answer queries about original much faster than by using exact representation. In this paper very natural type summary: set vertices is partitioned into small number super nodes connected edges form complete weighted graph. The edge weights are densities between corresponding nodes. goal produce summary minimizes reconstruction...

10.1109/icdm.2014.56 article EN 2014-12-01

We present ABRA, a suite of algorithms to compute and maintain probabilistically-guaranteed, high-quality, approximations the betweenness centrality all nodes (or edges) on both static fully dynamic graphs. Our use progressive random sampling their analysis rely Rademacher averages pseudodimension, fundamental concepts from statistical learning theory. To our knowledge, this is first application these field graph analysis. experimental results show that ABRA much faster than exact methods,...

10.1145/2939672.2939770 article EN 2016-08-08

We present SPuManTE, an efficient algorithm for mining significant patterns from a transactional dataset. SPuManTE controls the Family-wise Error Rate: it ensures that probability of reporting one or more false discoveries is less than user-specified threshold. A key ingredient UT, our novel unconditional statistical test evaluating significance pattern, requires fewer assumptions on data generation process and appropriate knowledge discovery setting classical conditional tests, such as...

10.1145/3292500.3330978 article EN 2019-07-25

The topology of the hyperlink graph among pages expressing different opinions may influence exposure readers to diverse content. Structural bias trap a reader in 'polarized' bubble with no access other opinions. We model readers' behavior as random walks. A node is if expected length walk from it page opinion large. structural sum radii highly-polarized bubbles. study problem decreasing through edge insertions. 'Healing' all nodes high polarized radius hard approximate within logarithmic...

10.1145/3437963.3441825 preprint EN 2021-03-06

We present MaNIACS , a sampling-based randomized algorithm for computing high-quality approximations of the collection subgraph patterns that are frequent in single, large, vertex-labeled graph, according to Minimum Node Image-based (MNI) frequency measure. The output comes with strong probabilistic guarantees, obtained by using empirical Vapnik–Chervonenkis (VC) dimension, key concept from statistical learning theory, together tail bounds on difference between pattern sample and its exact...

10.1145/3587254 article EN ACM Transactions on Intelligent Systems and Technology 2023-03-10

Centrality measures allow to measure the relative importance of a node or an edge in graph w.r.t.~other nodes edges. Several centrality have been developed literature capture different aspects informal concept importance, and algorithms for these proposed. In this tutorial, we survey definitions compute them. We start from most common measures, such as closeness betweenness centrality, move more complex ones spanning-edge centrality. our presentation, begin exact then progress approximation...

10.1145/2872518.2891063 article EN 2016-01-01

The availability of massive datasets has highlighted the need computationally efficient and statistically-sound methods to extracts patterns while providing rigorous guarantees on quality results, in particular with respect false discoveries. In this tutorial we survey recent that properly combine computational statistical considerations efficiently mine statistically reliable from large datasets. We start by introducing fundamental concepts hypothesis testing, including conditional...

10.1145/3292500.3332286 article EN 2019-07-25

Previous chapter Next Full AccessProceedings Proceedings of the 2014 SIAM International Conference on Data Mining (SDM)Finding True Frequent ItemsetsMatteo Riondato and Fabio VandinMatteo Vandinpp.497 - 505Chapter DOI:https://doi.org/10.1137/1.9781611973440.57PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract Itemsets (FIs) mining is a fundamental primitive in data mining. It requires identify all itemsets appearing at least fraction θ...

10.1137/1.9781611973440.57 article EN 2014-04-28

“I’m an MC still as honest” – Eminem, Rap God We present MCRapper , algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) families functions exhibiting poset (e.g., lattice) structure, such those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds the maximum deviation sample means from their expectations, thus it can be used find both (1) statistically-significant (i.e., patterns) when available data is seen a unknown...

10.1145/3532187 article EN ACM Transactions on Knowledge Discovery from Data 2022-04-25
Coming Soon ...