Marc Pérache

ORCID: 0000-0003-1615-2749
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Distributed and Parallel Computing Systems
  • Cloud Computing and Resource Management
  • Distributed systems and fault tolerance
  • Software System Performance and Reliability
  • Interconnection Networks and Systems
  • Embedded Systems Design Techniques
  • Ferroelectric and Negative Capacitance Devices
  • Peer-to-Peer Network Technologies
  • Matrix Theory and Algorithms
  • Scientific Computing and Data Management
  • Gas Dynamics and Kinetic Theory
  • Opportunistic and Delay-Tolerant Networks
  • Simulation Techniques and Applications
  • Nuclear reactor physics and engineering
  • Advanced Neural Network Applications
  • Advanced MEMS and NEMS Technologies
  • Data Visualization and Analytics
  • Advanced Optical Network Technologies
  • Advanced Data Compression Techniques
  • Numerical Methods and Algorithms
  • Brain Tumor Detection and Classification

CEA DAM Île-de-France
2015-2025

Maison de la Simulation
2021-2025

Université Paris-Saclay
2021-2025

Commissariat à l'Énergie Atomique et aux Énergies Alternatives
2014-2023

CEA Paris-Saclay
2021

Université de Reims Champagne-Ardenne
2019

Université de Versailles Saint-Quentin-en-Yvelines
2013-2017

Looking for high performance hydrocode simulations on heterogeneous architectures, we detail a portable implementation of second-order accurate 2-D Cartesian explicit CFD solver using Julia’s Just-in-Time (JIT) compilation. In this work, custom abstraction layer is used targeting two Julia packages, Polyester.jl efficient shared memory multithreading CPUs and KernelAbstractions.jl appropriate backends GPUs. Using very same optimizations data structures than those with Julia, comparisons to...

10.1177/10943420251341179 article EN The International Journal of High Performance Computing Applications 2025-05-20

In the race for Exascale, advent of many-core processors will bring a shift in parallel computing architectures to systems much higher concurrency, but with relatively smaller memory per thread. This raises concerns adaptability HPC software, current generation brave new world. this paper, we study domain splitting on an increasing number areas as example problem where negative performance impact computation could arise. We identify specific parameters that drive scalability problem, and...

10.1145/2802658.2802669 article EN 2015-09-01

The advent of many-core architectures poses new challenges to the MPI programming model which has been designed for distributed memory message passing. It is now clear that will have evolve in order exploit shared-memory parallelism, either by collaborating with other models (MPI+X) or introducing approaches. This paper considers extensions C and C++ make it possible Processes run into threads. More generally, a thread-local storage (TLS) library developed simplify collocation arbitrary...

10.1145/2966884.2966910 article EN 2016-09-25

Summary Today's trend to use accelerators in heterogeneous systems forces a paradigm shift programming models. The of low‐level APIs for accelerator is tedious and not intuitive casual programmers. To tackle this problem, recent approaches focused on high‐level directive‐based models, with standardization effort made OpenACC the directives latest OpenMP 4.0 release. pragmas data management automatically handle exchange between host device. keep runtime simple efficient, severe restrictions...

10.1002/cpe.3352 article EN Concurrency and Computation Practice and Experience 2014-08-13

Due to computer architecture evolution, more and HPC applications have include thread-based parallelism take care of memory consumption. Such evolutions require attention the full management chain, particularly stressed in multi-threaded context. Several allocators provide better scalability on user-space side. But, with steadily increasing number cores, impact operating system cannot be neglected anymore. We measured performance OS sub-system for up one third total execution time a real...

10.1145/2492408.2492414 preprint EN 2013-06-16

As the power of supercomputers is exponentially increasing, programmers are facing complex codes designed to comply with today's challenging architectural constraints. In such context, use tools within development cycle, becoming crucial in order optimise applications at scale. However, it not possible obtain all measurements one can think of, because cost produce, store and analyse large amounts instrumentation-data. Moreover, file-system a critical resource, subject performance even...

10.1109/icpp.2013.117 preprint EN 2013-10-01

Fault-tolerance has always been an important topic when it comes to running massively parallel programs at scale. Statistically, hardware and software failures are expected occur more often on systems gathering millions of computing units. Moreover, the larger jobs are, hours would be wasted by a crash. In this paper, we describe work done in our MPI runtime enable transparent checkpointing mechanism. Unlike 4.0 User-Level Failure Mitigation (ULFM) interface, targets solely...

10.1145/3236367.3236383 article EN 2018-09-19

The advent of multicore and manycore processors in clusters advocates for combining MPI with a shared memory model like OpenMP high-performance parallel applications. But exploiting hardware resources such models can be sub optimal. Thus, one approach is to use the hybrid context perform communications. In this paper, we address issue concept collective communications, which consists using threads parallelize collectives. We validate our on several libraries (IntelMPI MPC), improving overall...

10.1145/2642769.2642791 preprint EN 2014-08-29

10.5555/1791889.1791898 article EN IEEE International Conference on High Performance Computing, Data, and Analytics 2008-12-17

With the advent of multicore era, number cores per computational node is increasing faster than amount memory. This diminishing memory to core ratio sometimes even prevents pure MPI applications benefit from all available on each node. A possible solution add a shared programming model like Open MP inside application share variables between threads that would otherwise be duplicated for task. Going hybrid can thus improve overall consumption, but may tedious task large applications. To allow...

10.1109/ipdps.2012.42 article EN 2012-05-01

In order to enable Exascale computing, next generation interconnection networks must scale hundreds of thousands nodes, and provide features also allow the HPC, HPDA, AI applications reach Exascale, while benefiting from new hardware software trends. RED-SEA will pave way European interconnects, including BXI, as follows: (i) specify architecture using hardware-software co-design a set representative terrain converging AI; (ii) test, evaluate, and/or implement architectural at multiple...

10.1109/dsd57027.2022.00100 article EN 2022 25th Euromicro Conference on Digital System Design (DSD) 2022-08-01

To amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications be overlapped with computation. Unfortunately, are more CPU-hungry than point-to-point and running them in a communication thread on dedicated CPU core makes slow. On other hand, application cores leads no overlap. In this article, we propose placement algorithms for progress threads that do not degrade performance when get communication/computation We first show even...

10.1177/1094342019860184 article EN The International Journal of High Performance Computing Applications 2019-07-02
Coming Soon ...