- Parallel Computing and Optimization Techniques
- Distributed systems and fault tolerance
- Algorithms and Data Compression
- Distributed and Parallel Computing Systems
- Complexity and Algorithms in Graphs
- Advanced Data Storage Technologies
- Graph Theory and Algorithms
- Optimization and Search Problems
- Computational Geometry and Mesh Generation
- Interconnection Networks and Systems
- Data Management and Algorithms
- Logic, programming, and type systems
- Advanced Graph Theory Research
- Genomics and Phylogenetic Studies
- Cloud Computing and Resource Management
- Advanced Database Systems and Queries
- Computability, Logic, AI Algorithms
- Real-Time Systems Scheduling
- Formal Methods in Verification
- Embedded Systems Design Techniques
- Scheduling and Optimization Algorithms
- Caching and Content Delivery
- Security and Verification in Computing
- Genetic diversity and population structure
- Advanced Graph Neural Networks
Carnegie Mellon University
2016-2025
Google (United States)
2025
University of Crete
2020
University of California, Riverside
2020
York University
2020
Kitware (United States)
2020
Massachusetts Institute of Technology
1986-2019
Technion – Israel Institute of Technology
2019
Laboratoire d'Informatique de Paris-Nord
1996-2010
Toyota Technological Institute at Chicago
2006
Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While computational resources have become more accessible, developing algorithms still remains challenging, especially non-experts.In this work, we present GraphChi, disk-based system efficiently graphs with billions of edges. By using well-known method break into small parts, and novel parallel sliding windows method,...
There has been significant recent interest in parallel frameworks for processing graphs due to their applicability studying social networks, the Web graph, networks biology, and unstructured meshes scientific simulation. Due desire process large graphs, these systems have emphasized ability run on distributed memory machines. Today, however, a single multicore server can support more than terabyte of memory, which fit with tens or even hundreds billions edges. Furthermore, graph algorithms,...
A study of the effects adding two scan primitives as unit-time to PRAM (parallel random access machine) models is presented. It shown that improve asymptotic running time many algorithms by an O(log n) factor, greatly simplifying description algorithms, and are significantly easier implement than memory references. argued algorithm designer should feel free use these operations if they were cheap a reference. The author describes five clearly illustrate how can be used in design: radix-sort...
article Free Access Share on Programming parallel algorithms Author: Guy E. Blelloch Carnegie Mellon Univ., Pittsburgh, PA PAView Profile Authors Info & Claims Communications of the ACMVolume 39Issue 3March 1996 pp 85–97https://doi.org/10.1145/227234.227246Published:01 March 1996Publication History 301citation3,378DownloadsMetricsTotal Citations301Total Downloads3,378Last 12 Months168Last 6 weeks41 Get Citation AlertsNew Alert added!This alert has been successfully added and will be sent...
Abstract: Experienced algorithm designers rely heavily on a set of building blocks and the tools needed to put together into an algorithm. The understanding these basic is therefore critical algorithms. Many for parallel algorithms extend from sequential algorithms, such as dynamic-programming divide-and-conquer, but others are new. This paper introduces one simplest most useful algorithms: all-prefix-sums operation. defines operation, shows how implement it P-RAM illustrates many...
Article Free Access Share on A comparison of sorting algorithms for the connection machine CM-2 Authors: Guy E. Blelloch Carnegie Mellon University, Pittsburgh, PA PAView Profile , Charles Leiserson MIT, Cambridge, MA MAView Bruce M. Maggs NEC Research Institute, Princeton, NJ NJView C. Greg Plaxton University Texas Austin, TX TXView Stephen J. Smith Thinking Machines Corp. Marco Zagha Authors Info & Claims SPAA '91: Proceedings third annual ACM symposium Parallel and architecturesJune 1991...
This paper studies the data locality of work-stealing scheduling algorithm on hardware-controlled shared-memory machines. We present lower and upper bounds number cache misses using work stealing, introduce a locality-guided along with experimental validation.
We describe a parallel, real-time garbage collector and present experimental results that demonstrate good scalability bounds. The is designed for shared-memory multiprocessors based on an earlier algorithm [2], which provided fixed bounds the time any thread must pause collection. However, since our was simple analysis, it had some impractical features. This paper presents extensions necessary practical implementation: reducing excessive interleaving, handling stacks global variables,...
This announcement describes the problem based benchmark suite (PBBS). PBBS is a set of benchmarks designed for comparing parallel algorithmic approaches, programming language styles, and machine architectures across broad problems. Each defined concretely in terms specification input distributions. No requirements are made approach, language, or architecture. The goal not only to compare runtimes, but also be able code other aspects an implementation (e.g., portability, robustness,...
There has been significant recent interest in parallel frameworks for processing graphs due to their applicability studying social networks, the Web graph, networks biology, and unstructured meshes scientific simulation. Due desire process large graphs, these systems have emphasized ability run on distributed memory machines. Today, however, a single multicore server can support more than terabyte of memory, which fit with tens or even hundreds billions edges. Furthermore, graph algorithms,...
In chip multiprocessors (CMPs), limiting the number of offchip cache misses is crucial for good performance. Many multithreaded programs provide opportunities constructive sharing, in which concurrently scheduled threads share a largely overlapping working set. this paper, we compare performance two state-of-the-art schedulers proposed fine-grained programs: Parallel Depth First (PDF), specifically designed and Work Stealing (WS), more traditional design. Our experimental results indicate...
We study compression techniques for parallel in-memory graph algorithms, and show that we can achieve reduced space usage while obtaining competitive or improved performance compared to running the algorithms on uncompressed graphs. integrate into Ligra, a recent shared-memory processing system. This system, which call Ligra+, is able represent graphs using about half of average. Furthermore, Ligra+ slightly faster than Ligra average 40-core machine with hyper-threading. Our experimental...
The virtues of deterministic parallelism have been argued for decades and many forms described analyzed. Here we are concerned with one the strongest forms, requiring that any input there is a unique dependence graph representing trace computation annotated every operation value. This has referred to as internal determinism, implies sequential semantics---i.e., considering traversal sufficient analyzing correctness code. In addition returning results, determinism advantages including ease...
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on machines with private or shared caches. The is design nested-parallel have low depth (span, critical path length) which the natural sequential evaluation order has in cache-oblivious model. We describe several optimal work, polylogarithmic depth, complexities match best algorithms, including first such sorting sparse-matrix vector multiply matrices vertex...
The greedy sequential algorithm for maximal independent set (MIS) loops over the vertices in an arbitrary order adding a vertex to resulting if and only no previous neighboring has been added. In this loop, as many loops, each iterate will depend on subset of iterates (i.e. knowing that any one vertex's neighbors is MIS, or it neighbors, sufficient decide its fate way other). This leads dependence structure among iterates. If shallow then running parallel while respecting dependencies can...
Existing graph-processing frameworks let users develop efficient implementations for many graph problems, but none of them support efficiently bucketing vertices, which is needed bucketing-based algorithms such as \Delta-stepping and approximate set-cover. Motivated by the lack simple, scalable, algorithms, we Julienne framework, extends a recent shared-memory processing framework called Ligra with an interface maintaining collection buckets under vertex insertions bucket deletions.
There has been a growing interest in the graph-streaming setting where continuous stream of graph updates is mixed with queries. In principle, purely-functional trees are an ideal fit for this as they enable safe parallelism, lightweight snapshots, and strict serializability However, directly using them processing leads to significant space overhead poor cache locality.
In this paper we prove time and space bounds for the implementation of programming language NESL on various parallel machine models. is a sugared typed λ-calculus with set array primitives an explicit map over arrays. Our results extend previous work provable functional languages by considering including For modeling cost augment standard call-by-value operational semantics to return two measures: DAG representing sequential dependence in computation, measure taken implementation. We show...
We present techniques for incremental computing by introducing adaptive functional programming. As an program executes, the underlying system represents data and control dependences in execution form of a dynamic dependence graph . When input to changes, change propagation algorithm updates output propagating changes through re-executing code where necessary. Adaptive programs adapt their any input, small or large.We show that adaptivity are practical giving efficient implementation as ML...