- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Interconnection Networks and Systems
- Distributed systems and fault tolerance
- Cloud Computing and Resource Management
- Embedded Systems Design Techniques
- Algorithms and Data Compression
- Complexity and Algorithms in Graphs
- Optimization and Search Problems
- Radiation Effects in Electronics
- Software System Performance and Reliability
- Low-power high-performance VLSI design
- Scientific Computing and Data Management
- Cellular Automata and Applications
- Graph Theory and Algorithms
- Logic, programming, and type systems
- Advanced Neural Network Applications
- Caching and Content Delivery
- DNA and Biological Computing
- Formal Methods in Verification
- Quantum Computing Algorithms and Architecture
- Real-Time Systems Scheduling
- Advanced Graph Theory Research
- Bayesian Modeling and Causal Inference
University of Illinois Urbana-Champaign
2016-2025
Urbana University
2007-2025
University of Illinois System
2023
Institut national de recherche en informatique et en automatique
2011-2021
Sesame Workshop
2021
Lawrence Livermore National Laboratory
2018-2020
Argonne National Laboratory
2011-2018
The University of Texas at Austin
2018
Lawrence Berkeley National Laboratory
2018
International University of the Caribbean
2011-2018
Over the last 20 years, open-source community has provided more and software on which world’s high-performance computing systems depend for performance productivity. The invested millions of dollars years effort to build key components. However, although investments in these separate elements have been tremendously valuable, a great deal productivity also lost because lack planning, coordination, integration technologies necessary make them work together smoothly efficiently, both within...
In this paper we consider an optimization problem that arises in the execution of parallel programs on shared-memory multiple-instruction-stream, multiple-data-stream (MIMD) computers. A program such machines consists many sequential segments, each executed by a single processor. These segments interact as they access shared variables. Access to memory is asynchronous, and accesses are not necessarily order were issued. An correct if it sequentially consistent: It should seem all...
We present here a report produced by workshop on ‘Addressing failures in exascale computing’ held Park City, Utah, 4–11 August 2012. The charter of this was to establish common taxonomy about resilience across all the levels computing system, discuss existing knowledge various hardware and software layers an build those results, examining potential solutions from both perspective focusing combined approach. brought together participants with expertise applications, system software, hardware;...
Over the past few years resilience has became a major issue for high-performance computing (HPC) systems, in particular perspective of large petascale systems and future exascale systems. These will typically gather from half million to several millions central processing unit (CPU) cores running up billion threads. From current knowledge observations existing it is anticipated that experience various kind faults many times per day. It also approach resilience, which relies on automatic or...
Resilience is a major roadblock for HPC executions on future exascale systems. These systems will typically gather millions of CPU cores running up to billion threads. Projections from current large and technology evolution predict errors happen in many times per day. propagate generate various kinds malfunctions, simple process crashes result corruptions.The past five years have seen extraordinary technical progress domains related resilience. Several options, initially considered...
In December 1999, IBM announced the start of a five-year effort to build massively parallel computer, be applied study biomolecular phenomena such as protein folding. The project has two main goals: advance our understanding mechanisms behind folding via large-scale simulation, and explore novel ideas in machine architecture software. This should enable simulations that are orders magnitude larger than current technology permits. Major areas investigation include: how most effectively...
The prefix computation problem is to compute all <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> initial products xmlns:xlink="http://www.w3.org/1999/xlink">a</i> <sub xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> * . , xmlns:xlink="http://www.w3.org/1999/xlink">i</i> =1, ., of a set elements, where an associative operation. An O(((log ) log(2 / xmlns:xlink="http://www.w3.org/1999/xlink">p</i> ))XI( )) time deterministic parallel...
In this paper we introduce the Hierarchical Memory Model (HMM) of computation. It is intended to model computers with multiple levels in memory hierarchy. Access location x assumed take time ⌈ log ⌉. Tight lower and upper bounds are given for complexity searching, sorting, matrix multiplication FFT. Efficient algorithms utilize locality reference by bringing data into fast using them several times before returning slower memory. shown that circuit simulation problem has inherently poor...
The computational capabilities of a system n indistinguishable (anonymous) processors arranged on ring in the synchronous and asynchronous models distributed computation are analyzed. A precise characterization functions that can be computed this setting is given. It shown any these O ( 2 ) messages model. This also proved to lower bound for such elementary as AND, SUM, Orientation. In model computable function log messages. oriented start synchronized within same bounds. main contribution...
We propose extensions to the Message Passing Interface (MPI) that generalize MPI communicator concept allow multiple communication endpoints per process, dynamic creation of endpoints, and transfer between processes. The generalized construct can be used express a wide range interesting structures, including collective operations involving threads communications dynamically created threads, object-oriented applications in which are directed specific objects. Furthermore, this enriched...
The steadily increasing number of nodes in high-performance computing systems and the technology power constraints lead to sparse network topologies. Efficient mapping application communication patterns topology gains importance as grow petascale beyond. Such is supported parallel programming frameworks such MPI, but often not well implemented. We show that problem NP-complete analyze compare different practical heuristics. demonstrate an efficient fast new heuristic which based on graph...
The basic concept underlying probability theory and statistics is a function assigning numerical values (probabilities) to events. An “event” in this context any conceivable state of affairs including the so-called “empty event”—an priori impossible state. Informally, events are described everyday language (e.g. “by playing strategy I shall win $1000 before going broke”). But current mathematical framework (first proposed by Kolmogoroff [Ko 1]) they identified with subsets some all-inclusive...
In this paper we introduce a model of Hierarchical Memory with Block Transfer (BT for short). It is like random access machine, except that to location x takes time f(x), and block consecutive locations can be copied from memory memory, taking one unit per element after the initial time. We first study f(x) = xα 0 ≪ α 1. A tight bound θ(n log n) shown many simple problems: reading each input, dot product, shuffle exchange, merging two sorted lists. The same holds transposing √n × matrix; use...
As reported by many recent studies, the mean time between failures of future post-petascale supercomputers is likely to reduce, compared current situation. The most popular fault tolerance approach for MPI applications on HPC Platforms relies coordinated check pointing which raises two major issues: a) global restart wastes energy since all processes are forced rollback even in case a single failure, b) checkpoint coordination may slow down application execution because congestions I/O...
This paper introduces a new approach to building distributed-memory graph analytics systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies, programming models. The key this is Gluon, communication-optimizing substrate.
In the past few decades, a number of user-level threading and tasking models have been proposed in literature to address shortcomings OS-level threads, primarily with respect cost flexibility. Current state-of-the-art models, however, either are too specific applications or architectures not as powerful flexible. this paper, we present Argobots, lightweight, low-level framework that is designed portable performant substrate for high-level programming runtime systems. Argobots offers...
We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, at scale. The uses a genetic algorithm to search large space tunable parameters identify effective settings all layers the parallel stack. parameter are applied transparently by via dynamically intercepted calls.