Edith Cohen

ORCID: 0000-0002-3926-8237
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Management and Algorithms
  • Complexity and Algorithms in Graphs
  • Advanced Database Systems and Queries
  • Caching and Content Delivery
  • Optimization and Search Problems
  • Machine Learning and Algorithms
  • Algorithms and Data Compression
  • Peer-to-Peer Network Technologies
  • Advanced Graph Theory Research
  • Complex Network Analysis Techniques
  • Network Traffic and Congestion Control
  • Distributed systems and fault tolerance
  • Data Stream Mining Techniques
  • Advanced Graph Neural Networks
  • Machine Learning and Data Classification
  • Internet Traffic Analysis and Secure E-voting
  • Privacy-Preserving Technologies in Data
  • Data Visualization and Analytics
  • Graph Theory and Algorithms
  • Software-Defined Networks and 5G
  • Advanced Data Storage Technologies
  • Network Security and Intrusion Detection
  • Advanced Optical Network Technologies
  • Cryptography and Data Security
  • Bayesian Methods and Mixture Models

Tel Aviv University
2014-2024

Google (United States)
2015-2023

Google (Israel)
2013-2023

AT&T (United States)
2006-2018

Stanford University
1989-2016

Microsoft (United States)
2012-2015

Microsoft Research (United Kingdom)
2014

Nokia (United States)
1992-2006

DePaul University
1994

IBM (United States)
1991

Decentralized and unstructured peer-to-peer networks such as Gnutella are attractive for certain applications because they require no centralized directories precise control over network topology or data placement. However, the flooding-based query algorithm used in does not scale; each generates a large amount of traffic systems quickly become overwhelmed by query-induced load. This paper explores, through simulation, various alternatives to Gnutella's algorithm, replication strategy,...

10.1145/2591635.2667182 article EN 2014-01-01

The Peer-to-Peer (P2P) architectures that are most prevalent in today's Internet decentralized and unstructured. Search is blind it independent of the query thus not more effective than probing randomly chosen peers. One technique to improve effectiveness search proactively replicate data. We evaluate compare different replication strategies reveal interesting structure: Two very common but - uniform proportional yield same average performance on successful queries, fact worse any strategy...

10.1145/633025.633043 article EN 2002-08-19

Reachability and distance queries in graphs are fundamental to numerous applications, ranging from geographic navigation systems Internet routing. Some of these applications involve huge yet require fast query answering. We propose a new data structure for representing all distances graph. The is distributed the sense that it may be viewed as assigning labels vertices, such involving vertices u v answered using only v. Our based on 2-hop covers shortest paths, or For cover collection S paths...

10.1137/s0097539702403098 article EN SIAM Journal on Computing 2003-01-01

10.1006/jcss.1997.1534 article EN publisher-specific-oa Journal of Computer and System Sciences 1997-12-01

Propagation of contagion through networks is a fundamental process. It used to model the spread information, influence, or viral infection. Diffusion patterns can be specified by probabilistic model, such as Independent Cascade (IC), captured set representative traces.

10.1145/2661829.2662077 preprint EN 2014-11-03

Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, well-known a priori algorithm is only effective when rules interest are relationships that occur very frequently. However, there number applications, such as data mining, identification similar Web documents, clustering, and collaborative filtering, where have comparatively few instances in data. these cases, we must look for highly correlated items, or possibly even...

10.1109/69.908981 article EN IEEE Transactions on Knowledge and Data Engineering 2001-01-01

Intra-domain traffic engineering can significantly enhance the performance of large IP backbone networks. Two important components are understanding demandsand configuring routing protocols. These two inter-linked, as it is widely believed that an accurate view for optimizing configuration protocols and through that, utilization network.This basic premise, however, never seems to have been quantified --How knowledge demands obtaining good network? Since demand values dynamic illusive,...

10.1145/863955.863991 article EN 2003-08-25

We present algorithms to label the nodes of an XML tree which is subject insertions and deletions nodes. The labeling done such that (1) we each node immediately when it inserted this remains unchanged, (2) from a pair labels alone, can decide whether one ancestor other. This problem arises in context databases support queries on structure documents as well us changes made over time. prove our assign shortest possible (up constant factor) satisfy these requirements.We also consider same...

10.1145/543613.543648 article EN 2002-06-03

A recent seminal result of Racke is that for any network there an oblivious routing algorithm with a polylog competitive ratio respect to congestion. Unfortunately, Racke's construction not polynomial time. We give time guarantee's bounds, and more generally gives the true optimal network.

10.1145/780542.780599 article EN 2003-06-09

A Bottom-sketch is a summary of set items with nonnegative weights that supports approximate query processing. sketch obtained by associating each item in ground an independent random rank drawn from probability distribution depends on the weight and including k smallest value.

10.1145/1281100.1281133 article EN 2007-08-12

The distance between two vertices in a weighted graph is the weight of minimum-weight path them (where sum weights edges path). A has stretcht if its at most t times end points. We present algorithms that compute paths stretch $2\leq t\leq\log n$ on undirected graphs G=(V,E) with nonnegative weights. form $t=\beta(2+\epsilon')$, where $\beta$ integral and $\epsilon'>0$ least as large some fixed $\epsilon>0$. an $\tilde{O}((m+k)n^{(2+\epsilon)/t})$ time randomized algorithm finds k specified...

10.1137/s0097539794261295 article EN SIAM Journal on Computing 1998-01-01

The success of a P2P file-sharing network highly depends on the scalability and versatility its search mechanism. Two particularly desirable features are scope (ability to find infrequent items) support for partial-match queries (queries that contain typos or include subset keywords). While centralized-index architectures (such as Napster) can both these features, existing decentralized seem at most one: prevailing unstructured protocols Gnutella FastTrack) deploy "blind" mechanism where set...

10.1109/infcom.2003.1208962 article EN 2003-01-01

Reachability and distance queries in graphs are fundamental to numerous applications, ranging from geographic navigation systems Internet routing. Some of these applications involve huge yet require fast query answering. We propose a new data structure for representing all distances graph. The is distributed the sense that it may be viewed as assigning labels vertices, such involving vertices u v answered using only v.Our based on 2-hop covers shortest paths, or For cover collection S paths...

10.5555/545381.545503 article EN Symposium on Discrete Algorithms 2002-01-06

Decentralized and unstructured peer-to-peer networks such as Gnutella are attractive for certain applications because they require no centralized directories precise control over network topology or data placement. However, the flooding-based query algorithm used in does not scale; each individual generates a large amount of traffic systems quickly become overwhelmed by query-induced load. This paper explores various alternatives to Gnutella's replication strategy. We propose based on...

10.1145/511334.511369 article EN 2002-06-01

The rapid growth of the World Wide Web has caused serious performance degradation on Internet. This paper offers an end-to-end approach to improving by collectively examining components --- clients, proxies, servers, and network. Our goal is reduce user-perceived latency number TCP connections, improve cache coherency replacement, enable prefetching resources that are likely be accessed in near future. In our scheme, server response messages include piggybacked information customized...

10.1145/285237.285286 article EN 1998-10-01

We formalize the problem of maintaining time-decaying aggregates and statistics a data stream: relative contribution each item to aggregate is scaled down by factor that depends on, non-decreasing with, elapsed time. Time-decaying are used in applications where significance items decreases over develop storage-efficient algorithms, establish upper lower bounds. Surprisingly, even though decayed have become widely-used tool, our work seems be first both explore it formally provide algorithms...

10.1145/773153.773175 article EN 2003-06-09

The Peer-to-Peer (P2P) architectures that are most prevalent in today's Internet decentralized and unstructured. Search is blind it independent of the query thus not more effective than probing randomly chosen peers. One technique to improve effectiveness search proactively replicate data. We evaluate compare different replication strategies reveal interesting structure: Two very common but - uniform proportional yield same average performance on successful queries, fact worse any strategy...

10.1145/964725.633043 article EN ACM SIGCOMM Computer Communication Review 2002-08-19

Shortest paths computations constitute one of the most fundamental network problems. Nonetheless, known parallel shortest-paths algorithms are generally inefficient: they perform significantly more work (product time and processors) than their sequential counterparts. This gap, in literature as “transitive closure bottleneck,” poses a long-standing open problem. Our main result is an O(mn ϵ 0 +s( m+n 1+ϵ )) polylog-time randomized algorithm that computes within (1 + O (1/polylog n ) shortest...

10.1145/331605.331610 article EN Journal of the ACM 2000-01-01

Association rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, well-known a-priori algorithm is only effective when rules interest are relationships that occur very frequently. However, there a number applications, such as data mining, identification similar Web documents, clustering and collaborative filtering, where have comparatively few instances in data. these cases, we must look for highly correlated items, or possibly even...

10.1109/icde.2000.839448 article EN 2002-11-07

10.1016/j.jalgor.2005.01.006 article EN Journal of Algorithms 2005-03-08

Intra-domain traffic engineering can significantly enhance the performance of large IP backbone networks. Two important components are understanding demands and configuring routing protocols. These two inter-linked, as it is widely believed that an accurate view for optimizing configuration protocols, through that, utilization network. This basic premise, however, seems never to have been quantified. How knowledge obtaining good network? Since demand values dynamic illusive, possible obtain...

10.1109/tnet.2006.886296 article EN IEEE/ACM Transactions on Networking 2006-12-01

Summaries of massive data sets support approximate query processing over the original data. A basic aggregate a set records is weight subpopulations specified as predicate records' attributes. Bottom-k sketches are powerful summarization format weighted items that includes priority sampling [22], and classic without replacement. They can be computed efficiently for many representations including distributed databases streams coordinated all-distances sketches. We derive novel unbiased...

10.14778/1453856.1453884 article EN Proceedings of the VLDB Endowment 2008-08-01

Graph datasets with billions of edges, such as social and web graphs, are prevalent, scalability is critical. All-distances sketches (ADS) [Cohen 1997], a powerful tool for scalable approximation statistics. The sketch small size sample the distance relation node which emphasizes closer nodes. Sketches all nodes computed using nearly linear computation estimators applied to estimate their properties. We provide, first time, unified exposition ADS algorithms applications. present historic...

10.1109/tkde.2015.2411606 article EN IEEE Transactions on Knowledge and Data Engineering 2015-03-11

Closeness centrality, first considered by Bavelas (1948), is an importance measure of a node in network which based on the distances from to all other nodes. The classic definition, proposed (1950), Beauchamp (1965), and Sabidussi (1966), (the inverse of) average distance

10.1145/2660460.2660465 preprint EN 2014-10-01
Coming Soon ...