- Data Management and Algorithms
- Complexity and Algorithms in Graphs
- Advanced Database Systems and Queries
- Caching and Content Delivery
- Optimization and Search Problems
- Machine Learning and Algorithms
- Algorithms and Data Compression
- Peer-to-Peer Network Technologies
- Advanced Graph Theory Research
- Complex Network Analysis Techniques
- Network Traffic and Congestion Control
- Distributed systems and fault tolerance
- Data Stream Mining Techniques
- Advanced Graph Neural Networks
- Machine Learning and Data Classification
- Internet Traffic Analysis and Secure E-voting
- Privacy-Preserving Technologies in Data
- Data Visualization and Analytics
- Graph Theory and Algorithms
- Software-Defined Networks and 5G
- Advanced Data Storage Technologies
- Network Security and Intrusion Detection
- Advanced Optical Network Technologies
- Cryptography and Data Security
- Bayesian Methods and Mixture Models
Tel Aviv University
2014-2024
Google (United States)
2015-2023
Google (Israel)
2013-2023
AT&T (United States)
2006-2018
Stanford University
1989-2016
Microsoft (United States)
2012-2015
Microsoft Research (United Kingdom)
2014
Nokia (United States)
1992-2006
DePaul University
1994
IBM (United States)
1991
Decentralized and unstructured peer-to-peer networks such as Gnutella are attractive for certain applications because they require no centralized directories precise control over network topology or data placement. However, the flooding-based query algorithm used in does not scale; each generates a large amount of traffic systems quickly become overwhelmed by query-induced load. This paper explores, through simulation, various alternatives to Gnutella's algorithm, replication strategy,...
The Peer-to-Peer (P2P) architectures that are most prevalent in today's Internet decentralized and unstructured. Search is blind it independent of the query thus not more effective than probing randomly chosen peers. One technique to improve effectiveness search proactively replicate data. We evaluate compare different replication strategies reveal interesting structure: Two very common but - uniform proportional yield same average performance on successful queries, fact worse any strategy...
Reachability and distance queries in graphs are fundamental to numerous applications, ranging from geographic navigation systems Internet routing. Some of these applications involve huge yet require fast query answering. We propose a new data structure for representing all distances graph. The is distributed the sense that it may be viewed as assigning labels vertices, such involving vertices u v answered using only v. Our based on 2-hop covers shortest paths, or For cover collection S paths...
Propagation of contagion through networks is a fundamental process. It used to model the spread information, influence, or viral infection. Diffusion patterns can be specified by probabilistic model, such as Independent Cascade (IC), captured set representative traces.
Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, well-known a priori algorithm is only effective when rules interest are relationships that occur very frequently. However, there number applications, such as data mining, identification similar Web documents, clustering, and collaborative filtering, where have comparatively few instances in data. these cases, we must look for highly correlated items, or possibly even...
Intra-domain traffic engineering can significantly enhance the performance of large IP backbone networks. Two important components are understanding demandsand configuring routing protocols. These two inter-linked, as it is widely believed that an accurate view for optimizing configuration protocols and through that, utilization network.This basic premise, however, never seems to have been quantified --How knowledge demands obtaining good network? Since demand values dynamic illusive,...
We present algorithms to label the nodes of an XML tree which is subject insertions and deletions nodes. The labeling done such that (1) we each node immediately when it inserted this remains unchanged, (2) from a pair labels alone, can decide whether one ancestor other. This problem arises in context databases support queries on structure documents as well us changes made over time. prove our assign shortest possible (up constant factor) satisfy these requirements.We also consider same...
A recent seminal result of Racke is that for any network there an oblivious routing algorithm with a polylog competitive ratio respect to congestion. Unfortunately, Racke's construction not polynomial time. We give time guarantee's bounds, and more generally gives the true optimal network.
A Bottom-sketch is a summary of set items with nonnegative weights that supports approximate query processing. sketch obtained by associating each item in ground an independent random rank drawn from probability distribution depends on the weight and including k smallest value.
The distance between two vertices in a weighted graph is the weight of minimum-weight path them (where sum weights edges path). A has stretcht if its at most t times end points. We present algorithms that compute paths stretch $2\leq t\leq\log n$ on undirected graphs G=(V,E) with nonnegative weights. form $t=\beta(2+\epsilon')$, where $\beta$ integral and $\epsilon'>0$ least as large some fixed $\epsilon>0$. an $\tilde{O}((m+k)n^{(2+\epsilon)/t})$ time randomized algorithm finds k specified...
The success of a P2P file-sharing network highly depends on the scalability and versatility its search mechanism. Two particularly desirable features are scope (ability to find infrequent items) support for partial-match queries (queries that contain typos or include subset keywords). While centralized-index architectures (such as Napster) can both these features, existing decentralized seem at most one: prevailing unstructured protocols Gnutella FastTrack) deploy "blind" mechanism where set...
Reachability and distance queries in graphs are fundamental to numerous applications, ranging from geographic navigation systems Internet routing. Some of these applications involve huge yet require fast query answering. We propose a new data structure for representing all distances graph. The is distributed the sense that it may be viewed as assigning labels vertices, such involving vertices u v answered using only v.Our based on 2-hop covers shortest paths, or For cover collection S paths...
Decentralized and unstructured peer-to-peer networks such as Gnutella are attractive for certain applications because they require no centralized directories precise control over network topology or data placement. However, the flooding-based query algorithm used in does not scale; each individual generates a large amount of traffic systems quickly become overwhelmed by query-induced load. This paper explores various alternatives to Gnutella's replication strategy. We propose based on...
The rapid growth of the World Wide Web has caused serious performance degradation on Internet. This paper offers an end-to-end approach to improving by collectively examining components --- clients, proxies, servers, and network. Our goal is reduce user-perceived latency number TCP connections, improve cache coherency replacement, enable prefetching resources that are likely be accessed in near future. In our scheme, server response messages include piggybacked information customized...
We formalize the problem of maintaining time-decaying aggregates and statistics a data stream: relative contribution each item to aggregate is scaled down by factor that depends on, non-decreasing with, elapsed time. Time-decaying are used in applications where significance items decreases over develop storage-efficient algorithms, establish upper lower bounds. Surprisingly, even though decayed have become widely-used tool, our work seems be first both explore it formally provide algorithms...
The Peer-to-Peer (P2P) architectures that are most prevalent in today's Internet decentralized and unstructured. Search is blind it independent of the query thus not more effective than probing randomly chosen peers. One technique to improve effectiveness search proactively replicate data. We evaluate compare different replication strategies reveal interesting structure: Two very common but - uniform proportional yield same average performance on successful queries, fact worse any strategy...
Shortest paths computations constitute one of the most fundamental network problems. Nonetheless, known parallel shortest-paths algorithms are generally inefficient: they perform significantly more work (product time and processors) than their sequential counterparts. This gap, in literature as “transitive closure bottleneck,” poses a long-standing open problem. Our main result is an O(mn ϵ 0 +s( m+n 1+ϵ )) polylog-time randomized algorithm that computes within (1 + O (1/polylog n ) shortest...
Association rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, well-known a-priori algorithm is only effective when rules interest are relationships that occur very frequently. However, there a number applications, such as data mining, identification similar Web documents, clustering and collaborative filtering, where have comparatively few instances in data. these cases, we must look for highly correlated items, or possibly even...
Intra-domain traffic engineering can significantly enhance the performance of large IP backbone networks. Two important components are understanding demands and configuring routing protocols. These two inter-linked, as it is widely believed that an accurate view for optimizing configuration protocols, through that, utilization network. This basic premise, however, seems never to have been quantified. How knowledge obtaining good network? Since demand values dynamic illusive, possible obtain...
Summaries of massive data sets support approximate query processing over the original data. A basic aggregate a set records is weight subpopulations specified as predicate records' attributes. Bottom-k sketches are powerful summarization format weighted items that includes priority sampling [22], and classic without replacement. They can be computed efficiently for many representations including distributed databases streams coordinated all-distances sketches. We derive novel unbiased...
Graph datasets with billions of edges, such as social and web graphs, are prevalent, scalability is critical. All-distances sketches (ADS) [Cohen 1997], a powerful tool for scalable approximation statistics. The sketch small size sample the distance relation node which emphasizes closer nodes. Sketches all nodes computed using nearly linear computation estimators applied to estimate their properties. We provide, first time, unified exposition ADS algorithms applications. present historic...
Closeness centrality, first considered by Bavelas (1948), is an importance measure of a node in network which based on the distances from to all other nodes. The classic definition, proposed (1950), Beauchamp (1965), and Sabidussi (1966), (the inverse of) average distance