NFDI4DS | UHH-SEMS - Publication Details

Paolo Boldi

ORCID: 0000-0002-8297-6255

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5038842358

Research Areas

Complex Network Analysis Techniques
Algorithms and Data Compression
Web Data Mining and Analysis
Graph Theory and Algorithms
Advanced Graph Neural Networks
Bioinformatics and Genomic Networks
Data Management and Algorithms
Advanced Proteomics Techniques and Applications
Opinion Dynamics and Social Influence
Advanced Graph Theory Research
Caching and Content Delivery
DNA and Biological Computing
Advanced Image and Video Retrieval Techniques
Peer-to-Peer Network Technologies
Graph theory and applications
Distributed systems and fault tolerance
Computability, Logic, AI Algorithms
Semantic Web and Ontologies
Topic Modeling
Internet Traffic Analysis and Secure E-voting
Information Retrieval and Search Behavior
Health, Environment, Cognitive Aging
Advanced Database Systems and Queries
Computational Geometry and Mesh Generation
Network Packet Processing and Optimization

University of Milan
2014-2024

Yahoo (Spain)
2014

Informa (Italy)
2004

The webgraph framework I

OPENALEX - Publications

Paolo Boldi Sebastiano Vigna

Studying web graphs is often difficult due to their large size. Recently,several proposals have been published about various techniques that allow tostore a graph in memory limited space, exploiting the inner redundancies of web. The WebGraph framework suite codes, algorithms and tools aims at making it easy manipulate graphs. This papers presents compression used WebGraph, which are centred around referentiation intervalisation (which turn dual each other). can compress WebBase (118 Mnodes,...

10.1145/988672.988752 article EN 2004-05-17

Layered label propagation

OPENALEX - Publications

Paolo Boldi Marco Rosa Massimo Santini Sebastiano Vigna

We continue the line of research on graph compression started with WebGraph, but we move our focus to social networks in a proper sense (e.g., LiveJournal): approaches that have been used for long time compress web graphs rely specific ordering nodes (lexicographical URL ordering) whose extension general is not trivial. In this paper, propose solution mixes clusterings and orders, devise new algorithm, called Layered Label Propagation, builds previous work scalable clustering can be reorder...

10.1145/1963405.1963488 article EN 2011-03-28

UbiCrawler: a scalable fully distributed Web crawler

OPENALEX - Publications

Paolo Boldi Bruno Codenotti Massimo Santini Sebastiano Vigna

Abstract We report our experience in implementing UbiCrawler, a scalable distributed Web crawler, using the Java programming language. The main features of UbiCrawler are platform independence, linear scalability, graceful degradation presence faults, very effective assignment function (based on consistent hashing) for partitioning domain to crawl, and more general complete decentralization every task. necessity handling large sets data has highlighted some limitations APIs, which prompted...

10.1002/spe.587 article EN Software Practice and Experience 2004-03-24

Four degrees of separation

OPENALEX - Publications

Lars Bäckström Paolo Boldi Marco Rosa Johan Ugander Sebastiano Vigna

Frigyes Karinthy, in his 1929 short story "Láncszemek" (in English, "Chains") suggested that any two persons are distanced by at most six friendship links.1 Stanley Milgram famous experiments challenged people to route postcards a fixed recipient passing them only through direct acquaintances. found the average number of intermediaries on path lay between 4:4 and 5:7, depending sample chosen. We report results first world-scale social-network graph-distance computations, using entire...

10.1145/2380718.2380723 article EN 2012-06-22

Axioms for Centrality

OPENALEX - Publications

Paolo Boldi Sebastiano Vigna

Given a social network, which of its nodes are more central? This question has been asked many times in sociology, psychology, and computer science, whole plethora _centrality measures_ (a.k.a. indices_, or _rankings_) were proposed to account for the importance network. In this study, we try provide mathematically sound survey most important classic centrality measures known from literature propose an _axiomatic_ approach establish whether they actually doing what have designed do. Our...

10.1080/15427951.2013.865686 article EN Internet Mathematics 2014-04-04

Efficient semi-streaming algorithms for local triangle counting in massive graphs

OPENALEX - Publications

Luca Becchetti Paolo Boldi Carlos Castillo Aristides Gionis

In this paper we study the problem of local triangle counting in large graphs. Namely, given a graph G = (V;E) want to estimate as accurately possible number triangles incident every node υ ∈ V graph. The computing global has been considered before, but our knowledge is first that addresses with focus on efficiency issues arising massive distribution and related clustering coefficient can be used many interesting applications. For example, show measures compute help detect presence spamming...

10.1145/1401890.1401898 article EN 2008-08-24

The query-flow graph

OPENALEX - Publications

Paolo Boldi Francesco Bonchi Carlos Castillo Debora Donato Aristides Gionis and 1 more

Query logs record the queries and actions of users search engines, as such they contain valuable information about interests, preferences, behavior users, well their implicit feedback to engine results. Mining wealth available in query has many important applications including query-log analysis, user profiling personalization, advertising, recommendation, more.In this paper we introduce query-flow graph, a graph representation interesting knowledge latent querying behavior. Intuitively,...

10.1145/1458082.1458163 article EN 2008-10-26

A reference collection for web spam

OPENALEX - Publications

Carlos Castillo Debora Donato Luca Becchetti Paolo Boldi Stefano Leonardi and 2 more

We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if hosts are include spam aspects or not. This is first publicly available collection includes page contents and links, has labelled by diverse judges.

10.1145/1189702.1189703 article EN ACM SIGIR Forum 2006-12-01

PageRank as a function of the damping factor

OPENALEX - Publications

Paolo Boldi Massimo Santini Sebastiano Vigna

PageRank is defined as the stationary state of a Markov chain. The chain obtained by perturbing transition matrix induced web graph with damping factor α that spreads uniformly part rank. choice eminently empirical, and in most cases original suggestion = 0.85 Brin Page still used. Recently, however, behaviour respect to changes was discovered be useful link-spam detection[21]. Moreover, an analytical justification value chosen for missing. In this paper, we give first mathematical analysis...

10.1145/1060745.1060827 article EN 2005-01-01

A large time-aware web graph

OPENALEX - Publications

Paolo Boldi Massimo Santini Sebastiano Vigna

We describe the techniques developed to gather and distribute in a highly compressed, yet accessible, form series of twelve snapshot .uk web domain. Ad hoc compression made it possible store snapshots using just 1:9 bits per link, with constant-time access temporal information. Our collection makes study evolution link-based scores (e.g., PageRank), growth online communities, general time-dependent phenomena related link structure.

10.1145/1480506.1480511 article EN ACM SIGIR Forum 2008-11-30

Query suggestions using query-flow graphs

OPENALEX - Publications

Paolo Boldi Francesco Bonchi Carlos Castillo Debora Donato Sebastiano Vigna

The query-flow graph [Boldi et al., CIKM 2008] is an aggregated representation of the latent querying behavior contained in a query log. Intuitively, directed edge from qi to qj means that two queries are likely be part same search mission. Any path over may seen as possible task, whose likelihood given by strength edges along path. An (qi, qj) also labelled with some information: e.g., probability user moves qj, or type transition, for instance, fact specialization qi.

10.1145/1507509.1507518 article EN 2009-02-09

HyperANF

OPENALEX - Publications

Paolo Boldi Marco Rosa Sebastiano Vigna

The neighbourhood function NG(t) of a graph G gives, for each t ∈ N, the number pairs nodes x, y such that is reachable from x in less hops. provides wealth information about [10] (e.g., it easily allows one to compute its diameter), but very expensive exactly. Recently, ANF algorithm (approximate function) has been proposed with purpose approximating on large graphs. We describe breakthrough improvement over terms speed and scalability. Our algorithm, called HyperANF, uses new HyperLogLog...

10.1145/1963405.1963493 article EN 2011-03-28

Injecting uncertainty in graphs for identity obfuscation

OPENALEX - Publications

Paolo Boldi Francesco Bonchi Aristides Gionis Tamir Tassa

Data collected nowadays by social-networking applications create fascinating opportunities for building novel services, as well expanding our understanding about social structures and their dynamics. Unfortunately, publishing social-network graphs is considered an ill-advised practice due to privacy concerns. To alleviate this problem, several anonymization methods have been proposed, aiming at reducing the risk of a breach on published data, while still allowing analyze them draw relevant...

10.14778/2350229.2350254 article EN Proceedings of the VLDB Endowment 2012-07-01

Fibrations of graphs

OPENALEX - Publications

Paolo Boldi Sebastiano Vigna

10.1016/s0012-365x(00)00455-6 article EN Discrete Mathematics 2002-01-01

PageRank

OPENALEX - Publications

Paolo Boldi Massimo Santini Sebastiano Vigna

research-article Share on PageRank: Functional dependencies Authors: Paolo Boldi Università degli Studi di Milano, MI, Italy ItalyView Profile , Massimo Santini Sebastiano Vigna Authors Info & Claims ACM Transactions Information SystemsVolume 27Issue 4Article No.: 19pp 1–23https://doi.org/10.1145/1629096.1629097Published:30 November 2009Publication History 71citation1,468DownloadsMetricsTotal Citations71Total Downloads1,468Last 12 Months42Last 6 weeks2 Get Citation AlertsNew Alert added!This...

10.1145/1629096.1629097 article EN ACM transactions on office information systems 2009-11-01

Monotone minimal perfect hashing: searching a sorted table with O(1) accesses

OPENALEX - Publications

Djamal Belazzougui Paolo Boldi Rasmus Pagh Sebastiano Vigna

A minimal perfect hash function maps a set S of n keys into the {0, 1,..., -- 1} bijectively. Classical results state that hashing is possible in constant time using structure occupying space close to lower bound log e bits per element. Here we consider problem monotone hashing, which bijection required preserve lexicographical ordering keys. can be seen as very weak form index provides ranking just on (and answers randomly outside S). Our goal minimise description size function: show that,...

10.5555/1496770.1496856 article EN Symposium on Discrete Algorithms 2009-01-04

BUbiNG

OPENALEX - Publications

Paolo Boldi Andrea Marino Massimo Santini Sebastiano Vigna

Although web crawlers have been around for twenty years by now, there is virtually no freely available, open-source crawling software that guarantees high throughput, overcomes the limits of single-machine tools and at same time scales linearly with amount resources available. This paper aims filling this gap.

10.1145/2567948.2577304 article EN 2014-04-07

Generalizing PageRank

OPENALEX - Publications

Ricardo Baeza‐Yates Paolo Boldi Carlos Castillo

This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these there is damping function decreases with distance, so direct link implies more endorsement than long path. PageRank the most widely known this family.The main objective to determine whether techniques has some interest per se, and how different choices for impact on rank quality convergence speed. Even though our results suggest can be approximated other simpler forms...

10.1145/1148170.1148225 article EN 2006-08-06

Efficient algorithms for large-scale local triangle counting

OPENALEX - Publications

Luca Becchetti Paolo Boldi Carlos Castillo Aristides Gionis

In this article, we study the problem of approximate local triangle counting in large graphs. Namely, given a graph G =( V,E ) want to estimate as accurately possible number triangles incident every node v ∈ V graph. We consider question both for undirected and directed The computing global has been considered before, but our knowledge is first contribution that addresses with focus on efficiency issues arising massive graphs also considers case. distribution related clustering coefficient...

10.1145/1839490.1839494 article EN ACM Transactions on Knowledge Discovery from Data 2010-10-01

Computing anonymously with arbitrary knowledge

OPENALEX - Publications

Paolo Boldi Sebastiano Vigna

Article Free AccessComputing anonymously with arbitrary knowledge Share on Authors: Paolo Boldi Dipartimento di Scienze dell'Informazione, Università Milano, Italy ItalyView Profile , Sebastiano Vigna Authors Info & Claims PODC '99: Proceedings of the eighteenth annual ACM symposium Principles distributed computingMay 1999 Pages 181–188https://doi.org/10.1145/301308.301355Online:01 May 1999Publication History 60citation252DownloadsMetricsTotal Citations60Total Downloads252Last 12 Months4Last...

10.1145/301308.301355 article EN 1999-05-01

From "Dango" to "Japanese Cakes": Query Reformulation Models and Patterns

OPENALEX - Publications

Paolo Boldi Francesco Bonchi Carlos Castillo Sebastiano Vigna

Understanding query reformulation patterns is a key step towards next generation web search engines: it can help improving users' web-search experience by predicting their intent, and thus helping them to locate information more effectively. As in this direction, we build an accurate model for classifying user reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92\% accuracy. We apply the automatically label two large logs,...

10.1109/wi-iat.2009.34 article EN 2009-01-01

Viscous democracy for social networks

OPENALEX - Publications

Paolo Boldi Francesco Bonchi Carlos Castillo Sebastiano Vigna

Decision-making procedures in online social networks should reflect participants' political influence within the network.

10.1145/1953122.1953154 article EN Communications of the ACM 2011-06-01

Permuting Web and Social Graphs

OPENALEX - Publications

Paolo Boldi Massimo Santini Sebastiano Vigna

Since the first investigations on web-graph compression, it has been clear that ordering of nodes a web graph fundamental influence compression rate (usually expressed as number bits per link). The authors LINK database [Randall et al. 02], for instance, investigated three different approaches: an _extrinsic_ (URL ordering) and two _intrinsic_ orderings based rows adjacency matrix (lexicographic Gray code); they concluded URL many advantages in spite small penalty compression. In this paper...

10.1080/15427951.2009.10390641 article EN Internet Mathematics 2009-01-01

Coming Soon ...