NFDI4DS | UHH-SEMS - Publication Details

Marc Pérache

ORCID: 0000-0003-1615-2749

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5015641898

Research Areas

Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Distributed and Parallel Computing Systems
Cloud Computing and Resource Management
Distributed systems and fault tolerance
Software System Performance and Reliability
Interconnection Networks and Systems
Embedded Systems Design Techniques
Ferroelectric and Negative Capacitance Devices
Peer-to-Peer Network Technologies
Matrix Theory and Algorithms
Scientific Computing and Data Management
Gas Dynamics and Kinetic Theory
Opportunistic and Delay-Tolerant Networks
Simulation Techniques and Applications
Nuclear reactor physics and engineering
Advanced Neural Network Applications
Advanced MEMS and NEMS Technologies
Data Visualization and Analytics
Advanced Optical Network Technologies
Advanced Data Compression Techniques
Numerical Methods and Algorithms
Brain Tumor Detection and Classification

CEA DAM Île-de-France
2015-2025

Maison de la Simulation
2021-2025

Université Paris-Saclay
2021-2025

Commissariat à l'Énergie Atomique et aux Énergies Alternatives
2014-2023

CEA Paris-Saclay
2021

Université de Reims Champagne-Ardenne
2019

Université de Versailles Saint-Quentin-en-Yvelines
2013-2017

Julia versus C++ Kokkos for performance portable Cartesian CFD solvers on heterogeneous architectures

OPENALEX - Publications

Luc Briand Hervé Jourdren Marc Pérache

Looking for high performance hydrocode simulations on heterogeneous architectures, we detail a portable implementation of second-order accurate 2-D Cartesian explicit CFD solver using Julia’s Just-in-Time (JIT) compilation. In this work, custom abstraction layer is used targeting two Julia packages, Polyester.jl efficient shared memory multithreading CPUs and KernelAbstractions.jl appropriate backends GPUs. Using very same optimizations data structures than those with Julia, comparisons to...

10.1177/10943420251341179 article EN The International Journal of High Performance Computing Applications 2025-05-20

An MPI Halo-Cell Implementation for Zero-Copy Abstraction

OPENALEX - Publications

Jean-Baptiste Besnard Allen D. Malony Sameer Shende Marc Pérache Patrick Carribault and 1 more

In the race for Exascale, advent of many-core processors will bring a shift in parallel computing architectures to systems much higher concurrency, but with relatively smaller memory per thread. This raises concerns adaptability HPC software, current generation brave new world. this paper, we study domain splitting on an increasing number areas as example problem where negative performance impact computation could arise. We identify specific parameters that drive scalability problem, and...

10.1145/2802658.2802669 article EN 2015-09-01

Checkpoint/restart approaches for a thread-based MPI runtime

OPENALEX - Publications

Julien Adam Maxime Kermarquer Jean-Baptiste Besnard Leonardo Bautista-Gomez Marc Pérache and 4 more

10.1016/j.parco.2019.02.006 article EN Parallel Computing 2019-03-01

Introducing Task-Containers as an Alternative to Runtime-Stacking

OPENALEX - Publications

Jean-Baptiste Besnard Julien Adam Sameer Shende Marc Pérache Patrick Carribault and 2 more

The advent of many-core architectures poses new challenges to the MPI programming model which has been designed for distributed memory message passing. It is now clear that will have evolve in order exploit shared-memory parallelism, either by collaborating with other models (MPI+X) or introducing approaches. This paper considers extensions C and C++ make it possible Processes run into threads. More generally, a thread-local storage (TLS) library developed simplify collocation arbitrary...

10.1145/2966884.2966910 article EN 2016-09-25

Fine‐grain data management directory for OpenMP 4.0 and OpenACC

OPENALEX - Publications

Julien Jaeger Patrick Carribault Marc Pérache

Summary Today's trend to use accelerators in heterogeneous systems forces a paradigm shift programming models. The of low‐level APIs for accelerator is tedious and not intuitive casual programmers. To tackle this problem, recent approaches focused on high‐level directive‐based models, with standardization effort made OpenACC the directives latest OpenMP 4.0 release. pragmas data management automatically handle exchange between host device. keep runtime simple efficient, severe restrictions...

10.1002/cpe.3352 article EN Concurrency and Computation Practice and Experience 2014-08-13

Introducing kernel-level page reuse for high performance computing

OPENALEX - Publications

S. Valat Marc Pérache William Jalby

Due to computer architecture evolution, more and HPC applications have include thread-based parallelism take care of memory consumption. Such evolutions require attention the full management chain, particularly stressed in multi-threaded context. Several allocators provide better scalability on user-space side. But, with steadily increasing number cores, impact operating system cannot be neglected anymore. We measured performance OS sub-system for up one third total execution time a real...

10.1145/2492408.2492414 preprint EN 2013-06-16

Improving MPI communication overlap with collaborative polling

OPENALEX - Publications

Sylvain Didelot Patrick Carribault Marc Pérache William Jalby

10.1007/s00607-013-0327-z article EN Computing 2013-05-08

Event Streaming for Online Performance Measurements Reduction

OPENALEX - Publications

Jean-Baptiste Besnard Marc Pérache William Jalby

As the power of supercomputers is exponentially increasing, programmers are facing complex codes designed to comply with today's challenging architectural constraints. In such context, use tools within development cycle, becoming crucial in order optimise applications at scale. However, it not possible obtain all measurements one can think of, because cost produce, store and analyse large amounts instrumentation-data. Moreover, file-system a critical resource, subject performance even...

10.1109/icpp.2013.117 preprint EN 2013-10-01

Transparent High-Speed Network Checkpoint/Restart in MPI

OPENALEX - Publications

Julien Adam Jean-Baptiste Besnard Allen D. Malony Sameer Shende Marc Pérache and 2 more

Fault-tolerance has always been an important topic when it comes to running massively parallel programs at scale. Statistically, hardware and software failures are expected occur more often on systems gathering millions of computing units. Moreover, the larger jobs are, hours would be wasted by a crash. In this paper, we describe work done in our MPI runtime enable transparent checkpointing mechanism. Unlike 4.0 User-Level Failure Mitigation (ULFM) interface, targets solely...

10.1145/3236367.3236383 article EN 2018-09-19

Optimizing Collective Operations in Hybrid Applications

OPENALEX - Publications

Aurèle Mahéo Patrick Carribault Marc Pérache William Jalby

The advent of multicore and manycore processors in clusters advocates for combining MPI with a shared memory model like OpenMP high-performance parallel applications. But exploiting hardware resources such models can be sub optimal. Thus, one approach is to use the hybrid context perform communications. In this paper, we address issue concept collective communications, which consists using threads parallelize collectives. We validate our on several libraries (IntelMPI MPC), improving overall...

10.1145/2642769.2642791 preprint EN 2014-08-29

Fine tuning matrix multiplications on multicore

OPENALEX - Publications

Stéphane Zuckerman Marc Pérache William Jalby

10.5555/1791889.1791898 article EN IEEE International Conference on High Performance Computing, Data, and Analytics 2008-12-17

Hierarchical Local Storage: Exploiting Flexible User-Data Sharing Between MPI Tasks

OPENALEX - Publications

Marc Tchiboukdjian Patrick Carribault Marc Pérache

With the advent of multicore era, number cores per computational node is increasing faster than amount memory. This diminishing memory to core ratio sometimes even prevents pure MPI applications benefit from all available on each node. A possible solution add a shared programming model like Open MP inside application share variables between threads that would otherwise be duplicated for task. Going hybrid can thus improve overall consumption, but may tedious task large applications. To allow...

10.1109/ipdps.2012.42 article EN 2012-05-01

RED-SEA: Network Solution for Exascale Architectures

OPENALEX - Publications

A. Biagioni Paolo Cretaro Ottorino Frezza Francesca Lo Cicero A. Lonardo and 63 more

In order to enable Exascale computing, next generation interconnection networks must scale hundreds of thousands nodes, and provide features also allow the HPC, HPDA, AI applications reach Exascale, while benefiting from new hardware software trends. RED-SEA will pave way European interconnects, including BXI, as follows: (i) specify architecture using hardware-software co-design a set representative terrain converging AI; (ii) test, evaluate, and/or implement architectural at multiple...

10.1109/dsd57027.2022.00100 article EN 2022 25th Euromicro Conference on Digital System Design (DSD) 2022-08-01

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

OPENALEX - Publications

Alexandre Denis Julien Jaeger Emmanuel Jeannot Marc Pérache Hugo Taboada

To amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications be overlapped with computation. Unfortunately, are more CPU-hungry than point-to-point and running them in a communication thread on dedicated CPU core makes slow. On other hand, application cores leads no overlap. In this article, we propose placement algorithms for progress threads that do not degrade performance when get communication/computation We first show even...

10.1177/1094342019860184 article EN The International Journal of High Performance Computing Applications 2019-07-02

Coming Soon ...