- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Distributed systems and fault tolerance
- Cloud Computing and Resource Management
- Embedded Systems Design Techniques
- Software System Performance and Reliability
- Advanced Memory and Neural Computing
- Ferroelectric and Negative Capacitance Devices
- Scientific Computing and Data Management
- Advanced Mathematical Identities
- Modular Robots and Swarm Intelligence
- Radiation Effects in Electronics
- Caching and Content Delivery
- Algorithms and Data Compression
- Interconnection Networks and Systems
- Image and Signal Denoising Methods
- Analytic Number Theory Research
- Data Visualization and Analytics
- Advanced Data Compression Techniques
- Peer-to-Peer Network Technologies
- Real-Time Systems Scheduling
- Coding theory and cryptography
Commissariat à l'Énergie Atomique et aux Énergies Alternatives
2015-2024
Université Paris-Saclay
2014-2023
CEA DAM Île-de-France
2014-2023
Maison de la Simulation
2020-2022
CEA Paris-Saclay
2020-2022
Lawrence Livermore National Laboratory
2020
Institut Polytechnique de Bordeaux
2020
Intel (United States)
2020
Institut Lavoisier de Versailles
2012
University of California, Irvine
2012
In the race for Exascale, advent of many-core processors will bring a shift in parallel computing architectures to systems much higher concurrency, but with relatively smaller memory per thread. This raises concerns adaptability HPC software, current generation brave new world. this paper, we study domain splitting on an increasing number areas as example problem where negative performance impact computation could arise. We identify specific parameters that drive scalability problem, and...
This paper offers a timely study and proposed clarifications, revisions, enhancements to the Message Passing Interface's (MPI's) Semantic Terms Conventions. To enhance MPI, clearer understanding of meaning key terminology has proven essential, and, surprisingly, important concepts remain underspecified, ambiguous in some cases, inconsistent and/or conflicting despite 26 years standardization. work addresses these concerns comprehensively usefully informs MPI developers, implementors, those...
The advent of many-core architectures poses new challenges to the MPI programming model which has been designed for distributed memory message passing. It is now clear that will have evolve in order exploit shared-memory parallelism, either by collaborating with other models (MPI+X) or introducing approaches. This paper considers extensions C and C++ make it possible Processes run into threads. More generally, a thread-local storage (TLS) library developed simplify collocation arbitrary...
Summary Today's trend to use accelerators in heterogeneous systems forces a paradigm shift programming models. The of low‐level APIs for accelerator is tedious and not intuitive casual programmers. To tackle this problem, recent approaches focused on high‐level directive‐based models, with standardization effort made OpenACC the directives latest OpenMP 4.0 release. pragmas data management automatically handle exchange between host device. keep runtime simple efficient, severe restrictions...
Fault-tolerance has always been an important topic when it comes to running massively parallel programs at scale. Statistically, hardware and software failures are expected occur more often on systems gathering millions of computing units. Moreover, the larger jobs are, hours would be wasted by a crash. In this paper, we describe work done in our MPI runtime enable transparent checkpointing mechanism. Unlike 4.0 User-Level Failure Mitigation (ULFM) interface, targets solely...
MPI-3 provide functions for non-blocking collectives. To help programmers introduce collectives to existing MPI programs, we improve the PARCOACH tool checking correctness of call sequences. These enhancements focus on correct sequences all flavor collective calls, and presence completion calls communications. The evaluation shows an overhead under 10% original compilation time.
Stencil based computation on structured grids is a kernel at the heart of large number scientific applications. The variety stencil kernels used in practice make this pattern difficult to assemble into high performance computing library. With multiplication cores single chip, answering architectural alignment requirements became an even more important key performance. Along with vector accesses, data layout optimization must also consider concurrent parallel accesses. In paper, we develop...
This paper describes a short and simple way of improving the performance vector operations (e.g. X = aY +bZ +..) applied to large vectors. In previous [1] we described how take advantage high copy operation provided by ATLAS library [2] in context C++ Expression Template (ET) mechanism. Here present multi-threaded implementation this approach. The proposed ET that involves parallel blocking technique, leads significant increase compared existing implementations (up x2.7) on dual socket...
Partitioned point-to-point communication and persistent collective were both recently standardized in MPI-4.0. Each offers performance scalability advantages over MPI-3.1-based when planned transfers are feasible an MPI application. Their merger into a generalized, with partitions is logical next step, significant for portability. Non-trivial decisions about the syntax semantics of such operations need to be addressed, including scope knowledge partitioning choices by members communicator's...
The Message Passing Interface (MPI) is a parallel programming model used to exchange data between working units in different nodes of supercomputer. While MPI blocking operations return when the communication complete, non-blocking and persistent before enabling developer hide latency. However usage these latter comes with additional rules user has abide to. This error prone, which makes verification tools valuable for program writers. PARCOACH framework that detects collective errors using...
The complexity of High Performance Computing nodes memory system increases in order to challenge application growing usage and increasing gap between computation access speeds. As these technologies are just being introduced HPC supercomputers no one knows if it is better manage them with hardware or software solutions. Thus both studied parallel. For solutions, the problem consists choosing which data store on at any time.
To amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications be overlapped with computation. Unfortunately, are more CPU-hungry than point-to-point and running them in a communication thread on dedicated CPU core makes slow. On other hand, application cores leads no overlap. In this article, we propose placement algorithms for progress threads that do not degrade performance when get communication/computation We first show even...
Many-core processors are imposing new constraints to parallel applications. In particular, the MPI+X model or hybridization is becoming a compulsory avenue extract performance by mitigating both memory and communication overhead. this context, tools also have evolve in order represent more complex states combining multiple runtimes programming models. paper, we propose start from well-known metric, Speedup, showing that it can be bounded acceleration of any program section. From observation,...
Persistent collective communications have recently been voted in the MPI standard, opening door to many optimizations reduce collectives cost, particular for recurring operations. Indeed persistent semantics contains an initialization phase called only once a specific collective. It can be used collect building costs necessary collective, avoid paying them each time operation is performed.