Julien Jaeger

ORCID: 0000-0003-0084-1574
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Distributed and Parallel Computing Systems
  • Distributed systems and fault tolerance
  • Cloud Computing and Resource Management
  • Embedded Systems Design Techniques
  • Software System Performance and Reliability
  • Advanced Memory and Neural Computing
  • Ferroelectric and Negative Capacitance Devices
  • Scientific Computing and Data Management
  • Advanced Mathematical Identities
  • Modular Robots and Swarm Intelligence
  • Radiation Effects in Electronics
  • Caching and Content Delivery
  • Algorithms and Data Compression
  • Interconnection Networks and Systems
  • Image and Signal Denoising Methods
  • Analytic Number Theory Research
  • Data Visualization and Analytics
  • Advanced Data Compression Techniques
  • Peer-to-Peer Network Technologies
  • Real-Time Systems Scheduling
  • Coding theory and cryptography

Commissariat à l'Énergie Atomique et aux Énergies Alternatives
2015-2024

Université Paris-Saclay
2014-2023

CEA DAM Île-de-France
2014-2023

Maison de la Simulation
2020-2022

CEA Paris-Saclay
2020-2022

Lawrence Livermore National Laboratory
2020

Institut Polytechnique de Bordeaux
2020

Intel (United States)
2020

Institut Lavoisier de Versailles
2012

University of California, Irvine
2012

In the race for Exascale, advent of many-core processors will bring a shift in parallel computing architectures to systems much higher concurrency, but with relatively smaller memory per thread. This raises concerns adaptability HPC software, current generation brave new world. this paper, we study domain splitting on an increasing number areas as example problem where negative performance impact computation could arise. We identify specific parameters that drive scalability problem, and...

10.1145/2802658.2802669 article EN 2015-09-01

This paper offers a timely study and proposed clarifications, revisions, enhancements to the Message Passing Interface's (MPI's) Semantic Terms Conventions. To enhance MPI, clearer understanding of meaning key terminology has proven essential, and, surprisingly, important concepts remain underspecified, ambiguous in some cases, inconsistent and/or conflicting despite 26 years standardization. work addresses these concerns comprehensively usefully informs MPI developers, implementors, those...

10.1145/3343211.3343213 preprint EN 2019-08-23

The advent of many-core architectures poses new challenges to the MPI programming model which has been designed for distributed memory message passing. It is now clear that will have evolve in order exploit shared-memory parallelism, either by collaborating with other models (MPI+X) or introducing approaches. This paper considers extensions C and C++ make it possible Processes run into threads. More generally, a thread-local storage (TLS) library developed simplify collocation arbitrary...

10.1145/2966884.2966910 article EN 2016-09-25

Summary Today's trend to use accelerators in heterogeneous systems forces a paradigm shift programming models. The of low‐level APIs for accelerator is tedious and not intuitive casual programmers. To tackle this problem, recent approaches focused on high‐level directive‐based models, with standardization effort made OpenACC the directives latest OpenMP 4.0 release. pragmas data management automatically handle exchange between host device. keep runtime simple efficient, severe restrictions...

10.1002/cpe.3352 article EN Concurrency and Computation Practice and Experience 2014-08-13

Fault-tolerance has always been an important topic when it comes to running massively parallel programs at scale. Statistically, hardware and software failures are expected occur more often on systems gathering millions of computing units. Moreover, the larger jobs are, hours would be wasted by a crash. In this paper, we describe work done in our MPI runtime enable transparent checkpointing mechanism. Unlike 4.0 User-Level Failure Mitigation (ULFM) interface, targets solely...

10.1145/3236367.3236383 article EN 2018-09-19

MPI-3 provide functions for non-blocking collectives. To help programmers introduce collectives to existing MPI programs, we improve the PARCOACH tool checking correctness of call sequences. These enhancements focus on correct sequences all flavor collective calls, and presence completion calls communications. The evaluation shows an overhead under 10% original compilation time.

10.1145/2802658.2802674 article EN 2015-09-01

Stencil based computation on structured grids is a kernel at the heart of large number scientific applications. The variety stencil kernels used in practice make this pattern difficult to assemble into high performance computing library. With multiplication cores single chip, answering architectural alignment requirements became an even more important key performance. Along with vector accesses, data layout optimization must also consider concurrent parallel accesses. In paper, we develop...

10.1109/hipc.2012.6507504 preprint EN 2012-12-01

This paper describes a short and simple way of improving the performance vector operations (e.g. X = aY +bZ +..) applied to large vectors. In previous [1] we described how take advantage high copy operation provided by ATLAS library [2] in context C++ Expression Template (ET) mechanism. Here present multi-threaded implementation this approach. The proposed ET that involves parallel blocking technique, leads significant increase compared existing implementations (up x2.7) on dual socket...

10.1145/1595655.1595663 preprint EN 2009-07-07

Partitioned point-to-point communication and persistent collective were both recently standardized in MPI-4.0. Each offers performance scalability advantages over MPI-3.1-based when planned transfers are feasible an MPI application. Their merger into a generalized, with partitions is logical next step, significant for portability. Non-trivial decisions about the syntax semantics of such operations need to be addressed, including scope knowledge partitioning choices by members communicator's...

10.1109/exampi54564.2021.00007 article EN 2021-11-01

The Message Passing Interface (MPI) is a parallel programming model used to exchange data between working units in different nodes of supercomputer. While MPI blocking operations return when the communication complete, non-blocking and persistent before enabling developer hide latency. However usage these latter comes with additional rules user has abide to. This error prone, which makes verification tools valuable for program writers. PARCOACH framework that detects collective errors using...

10.1109/correctness51934.2020.00009 article EN 2020-11-01

The complexity of High Performance Computing nodes memory system increases in order to challenge application growing usage and increasing gap between computation access speeds. As these technologies are just being introduced HPC supercomputers no one knows if it is better manage them with hardware or software solutions. Thus both studied parallel. For solutions, the problem consists choosing which data store on at any time.

10.1145/3240302.3240313 preprint EN Proceedings of the International Symposium on Memory Systems 2018-10-01

To amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications be overlapped with computation. Unfortunately, are more CPU-hungry than point-to-point and running them in a communication thread on dedicated CPU core makes slow. On other hand, application cores leads no overlap. In this article, we propose placement algorithms for progress threads that do not degrade performance when get communication/computation We first show even...

10.1177/1094342019860184 article EN The International Journal of High Performance Computing Applications 2019-07-02

Many-core processors are imposing new constraints to parallel applications. In particular, the MPI+X model or hybridization is becoming a compulsory avenue extract performance by mitigating both memory and communication overhead. this context, tools also have evolve in order represent more complex states combining multiple runtimes programming models. paper, we propose start from well-known metric, Speedup, showing that it can be bounded acceleration of any program section. From observation,...

10.1109/icppw.2017.45 article EN 2017-08-01

Persistent collective communications have recently been voted in the MPI standard, opening door to many optimizations reduce collectives cost, particular for recurring operations. Indeed persistent semantics contains an initialization phase called only once a specific collective. It can be used collect building costs necessary collective, avoid paying them each time operation is performed.

10.1145/3416315.3416321 article EN 2020-09-21
Coming Soon ...