Marc Snir

ORCID: 0000-0002-3504-2468
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Distributed and Parallel Computing Systems
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Distributed systems and fault tolerance
  • Cloud Computing and Resource Management
  • Embedded Systems Design Techniques
  • Algorithms and Data Compression
  • Complexity and Algorithms in Graphs
  • Optimization and Search Problems
  • Radiation Effects in Electronics
  • Software System Performance and Reliability
  • Low-power high-performance VLSI design
  • Scientific Computing and Data Management
  • Cellular Automata and Applications
  • Graph Theory and Algorithms
  • Logic, programming, and type systems
  • Advanced Neural Network Applications
  • Caching and Content Delivery
  • DNA and Biological Computing
  • Formal Methods in Verification
  • Quantum Computing Algorithms and Architecture
  • Real-Time Systems Scheduling
  • Advanced Graph Theory Research
  • Bayesian Modeling and Causal Inference

University of Illinois Urbana-Champaign
2016-2025

Urbana University
2007-2025

University of Illinois System
2023

Institut national de recherche en informatique et en automatique
2011-2021

Sesame Workshop
2021

Lawrence Livermore National Laboratory
2018-2020

Argonne National Laboratory
2011-2018

The University of Texas at Austin
2018

Lawrence Berkeley National Laboratory
2018

International University of the Caribbean
2011-2018

Over the last 20 years, open-source community has provided more and software on which world’s high-performance computing systems depend for performance productivity. The invested millions of dollars years effort to build key components. However, although investments in these separate elements have been tremendously valuable, a great deal productivity also lost because lack planning, coordination, integration technologies necessary make them work together smoothly efficiently, both within...

10.1177/1094342010391989 article EN The International Journal of High Performance Computing Applications 2011-01-06

In this paper we consider an optimization problem that arises in the execution of parallel programs on shared-memory multiple-instruction-stream, multiple-data-stream (MIMD) computers. A program such machines consists many sequential segments, each executed by a single processor. These segments interact as they access shared variables. Access to memory is asynchronous, and accesses are not necessarily order were issued. An correct if it sequentially consistent: It should seem all...

10.1145/42190.42277 article EN ACM Transactions on Programming Languages and Systems 1988-04-01

We present here a report produced by workshop on ‘Addressing failures in exascale computing’ held Park City, Utah, 4–11 August 2012. The charter of this was to establish common taxonomy about resilience across all the levels computing system, discuss existing knowledge various hardware and software layers an build those results, examining potential solutions from both perspective focusing combined approach. brought together participants with expertise applications, system software, hardware;...

10.1177/1094342014522573 article EN The International Journal of High Performance Computing Applications 2014-03-21

Over the past few years resilience has became a major issue for high-performance computing (HPC) systems, in particular perspective of large petascale systems and future exascale systems. These will typically gather from half million to several millions central processing unit (CPU) cores running up billion threads. From current knowledge observations existing it is anticipated that experience various kind faults many times per day. It also approach resilience, which relies on automatic or...

10.1177/1094342009347767 article EN The International Journal of High Performance Computing Applications 2009-09-17

Resilience is a major roadblock for HPC executions on future exascale systems. These systems will typically gather millions of CPU cores running up to billion threads. Projections from current large and technology evolution predict errors happen in many times per day. propagate generate various kinds malfunctions, simple process crashes result corruptions.The past five years have seen extraordinary technical progress domains related resilience. Several options, initially considered...

10.14529/jsfi140101 article EN cc-by Supercomputing Frontiers and Innovations 2014-03-01

In December 1999, IBM announced the start of a five-year effort to build massively parallel computer, be applied study biomolecular phenomena such as protein folding. The project has two main goals: advance our understanding mechanisms behind folding via large-scale simulation, and explore novel ideas in machine architecture software. This should enable simulations that are orders magnitude larger than current technology permits. Major areas investigation include: how most effectively...

10.1147/sj.402.0310 article EN IBM Systems Journal 2001-01-01

10.1016/0304-3975(90)90188-n article EN Theoretical Computer Science 1990-03-01

The prefix computation problem is to compute all <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> initial products xmlns:xlink="http://www.w3.org/1999/xlink">a</i> <sub xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> * . , xmlns:xlink="http://www.w3.org/1999/xlink">i</i> =1, ., of a set elements, where an associative operation. An O(((log ) log(2 / xmlns:xlink="http://www.w3.org/1999/xlink">p</i> ))XI( )) time deterministic parallel...

10.1109/tc.1985.6312202 article EN IEEE Transactions on Computers 1985-10-01

In this paper we introduce the Hierarchical Memory Model (HMM) of computation. It is intended to model computers with multiple levels in memory hierarchy. Access location x assumed take time ⌈ log ⌉. Tight lower and upper bounds are given for complexity searching, sorting, matrix multiplication FFT. Efficient algorithms utilize locality reference by bringing data into fast using them several times before returning slower memory. shown that circuit simulation problem has inherently poor...

10.1145/28395.28428 article EN 1987-01-01

The computational capabilities of a system n indistinguishable (anonymous) processors arranged on ring in the synchronous and asynchronous models distributed computation are analyzed. A precise characterization functions that can be computed this setting is given. It shown any these O ( 2 ) messages model. This also proved to lower bound for such elementary as AND, SUM, Orientation. In model computable function log messages. oriented start synchronized within same bounds. main contribution...

10.1145/48014.48247 article EN Journal of the ACM 1988-10-01

10.1016/0304-3975(90)90192-k article EN publisher-specific-oa Theoretical Computer Science 1990-03-01

We propose extensions to the Message Passing Interface (MPI) that generalize MPI communicator concept allow multiple communication endpoints per process, dynamic creation of endpoints, and transfer between processes. The generalized construct can be used express a wide range interesting structures, including collective operations involving threads communications dynamically created threads, object-oriented applications in which are directed specific objects. Furthermore, this enriched...

10.1109/mpidc.1996.534093 article EN 2002-12-23

The steadily increasing number of nodes in high-performance computing systems and the technology power constraints lead to sparse network topologies. Efficient mapping application communication patterns topology gains importance as grow petascale beyond. Such is supported parallel programming frameworks such MPI, but often not well implemented. We show that problem NP-complete analyze compare different practical heuristics. demonstrate an efficient fast new heuristic which based on graph...

10.1145/1995896.1995909 article EN 2011-05-31

The basic concept underlying probability theory and statistics is a function assigning numerical values (probabilities) to events. An “event” in this context any conceivable state of affairs including the so-called “empty event”—an priori impossible state. Informally, events are described everyday language (e.g. “by playing strategy I shall win $1000 before going broke”). But current mathematical framework (first proposed by Kolmogoroff [Ko 1]) they identified with subsets some all-inclusive...

10.2307/2273587 article EN Journal of Symbolic Logic 1982-09-01

In this paper we introduce a model of Hierarchical Memory with Block Transfer (BT for short). It is like random access machine, except that to location x takes time f(x), and block consecutive locations can be copied from memory memory, taking one unit per element after the initial time. We first study f(x) = xα 0 ≪ α 1. A tight bound θ(n log n) shown many simple problems: reading each input, dot product, shuffle exchange, merging two sorted lists. The same holds transposing √n × matrix; use...

10.1109/sfcs.1987.31 article EN 1987-10-01

As reported by many recent studies, the mean time between failures of future post-petascale supercomputers is likely to reduce, compared current situation. The most popular fault tolerance approach for MPI applications on HPC Platforms relies coordinated check pointing which raises two major issues: a) global restart wastes energy since all processes are forced rollback even in case a single failure, b) checkpoint coordination may slow down application execution because congestions I/O...

10.1109/ipdps.2011.95 preprint EN 2011-05-01

This paper introduces a new approach to building distributed-memory graph analytics systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies, programming models. The key this is Gluon, communication-optimizing substrate.

10.1145/3192366.3192404 article EN 2018-06-11

In the past few decades, a number of user-level threading and tasking models have been proposed in literature to address shortcomings OS-level threads, primarily with respect cost flexibility. Current state-of-the-art models, however, either are too specific applications or architectures not as powerful flexible. this paper, we present Argobots, lightweight, low-level framework that is designed portable performant substrate for high-level programming runtime systems. Argobots offers...

10.1109/tpds.2017.2766062 article EN publisher-specific-oa IEEE Transactions on Parallel and Distributed Systems 2017-10-24

We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, at scale. The uses a genetic algorithm to search large space tunable parameters identify effective settings all layers the parallel stack. parameter are applied transparently by via dynamically intercepted calls.

10.1145/2503210.2503278 article EN 2013-10-30
Coming Soon ...