- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Cloud Computing and Resource Management
- Distributed systems and fault tolerance
- Interconnection Networks and Systems
- Software System Performance and Reliability
- Migration, Aging, and Tourism Studies
- Virtual Reality Applications and Impacts
- Surgical Simulation and Training
- Indigenous Knowledge Systems and Agriculture
- Augmented Reality Applications
- Software Engineering Research
- Speech Recognition and Synthesis
- Invertebrate Taxonomy and Ecology
Sandia National Laboratories
2020-2024
Sandia National Laboratories California
2022-2024
Barcelona Supercomputing Center
2013-2019
Universitat Politècnica de Catalunya
2014-2019
As the push towards exascale hardware has increased diversity of system architectures, performance portability become a critical aspect for scientific software. We describe Kokkos Performance Portable Programming Model that allows developers to write single source applications diverse high-performance computing architectures. provides key abstractions both compute and memory hierarchy modern hardware. novel have been added version 3 such as hierarchical parallelism, containers, task graphs,...
The Open MPI for Exascale (OMPI-X) project was one of two in the Computing Project (ECP) focused on advancing ecosystem. OMPI-X team worked with other Forum members to champion several important features inclusion 4.0, 4.1, and upcoming 5.0 standard versions, support needs exascale applications systems. also larger community bring implementations these new enhancements into MPI, leading open-source interface. This paper describes motivation work context computing needs, nature resulting...
Reductions matter and they are here to stay. Wide adoption of parallel processing hardware in a broad range computer applications has encouraged recent research efforts on their efficient parallelization. Furthermore, trends towards high productivity languages mainstream computing increases the demand for programming support. In this paper we present new approach reductions distributed memory systems that provides both scalability programmability. Using OmpSs, task-based model, developer...
Array-type reductions represent a frequently occurring algorithmic pattern in many scientific applications. A special case occurs if array elements are accessed an irregular, often random manner, making their concurrent and scalable execution difficult. In this work we present new approach that consists of language runtime support targets popular parallel programming models such as OpenMP. Its implements Privatization with In-lined, Block-Ordered Reductions (PIBOR), trades processor cycles...
Reductions constitute a frequent algorithmic pattern in high-performance and scientific computing. Sophisticated techniques are needed to ensure their correct scalable concurrent execution on modern processors. large arrays represent the most demanding case where traditional approaches not always applicable due low performance scalability.
Kokkos provides in-memory advanced data structures, concurrency, and algorithms to support performance portable C++ parallel programming across CPUs GPUs. The Message Passing Interface (MPI) the most widely used message passing model for inter-node communication. Many programmers use both MPI together. In this paper, is integrated within an implementation ease of in applications that MPI, without sacrificing performance. For instance, allows first-class objects directly extended C++-based APIs.
Achieving scalable performance on supercomputers requires careful coordination of communication and computation. Often, MPI applications rely buffering, packing, sorting techniques to accommodate a two-sided API, minimize overhead, achieve goals. As interconnects between accelerators become more performant scalable, programming models such as SHMEM may have the opportunity enable bandwidth maximization along with ease programming. In this work, we take closer look at device-initiated PGAS...
Multithreaded MPI applications are gaining popularity in scientific and high-performance computing. While the combination of programming models is suited to support current parallel hardware, it moves threading their interaction with into focus. With advent new libraries, flexibility select implementations choice becoming an important usability feature. Open has traditionally avoided componentizing its model, relying on code inlining static initialization minimize potential impacts runtime...
Parallel patterns, views, and spaces are promising abstractions to capture the programmer's intent as well contextual information that can be used by an underlying runtime efficiently map software parallel hardware. These valuable in cases where algorithm must accommodate requirements of code performance portability across hardware architectures vendor programming models. Kokkos is a model for host- accelerator relies on these targets requirements. It consists pure C++ interface,...
maps them to an underlying abstract machine model. The model offers a generic view of parallel hardware. While Kokkos is gaining popularity in large-scale HPC applications at some DOE laboratories, we believe that the implemented concepts are interest broader audience including academia as they may contribute generic, vendor, and architecture-independent education programming. In this work, give insight into design considerations programming list important abstractions. Further, document...