Nicolas Derumigny

ORCID: 0000-0002-0224-4098
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Superconducting Materials and Applications
  • Cloud Computing and Resource Management
  • Embedded Systems Design Techniques
  • Advanced Data Storage Technologies
  • Distributed systems and fault tolerance
  • Radiation Effects in Electronics
  • Distributed and Parallel Computing Systems
  • Service-Oriented Architecture and Web Services
  • Mobile Agent-Based Network Management
  • Interconnection Networks and Systems
  • Ferroelectric and Negative Capacitance Devices
  • Software System Performance and Reliability

Département d'Informatique
2024

Télécom Paris
2024

Institut Polytechnique de Paris
2024

Telecom SudParis
2024

Colorado State University
2021-2022

Université Grenoble Alpes
2022

Institut polytechnique de Grenoble
2022

Laboratoire d'Informatique de Grenoble
2022

Centre Inria de l'Université Grenoble Alpes
2022

Centre National de la Recherche Scientifique
2022

The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern hardware at cycle level, it has enough fidelity boot unmodified Linux-based operating systems run full applications multiple architectures including x86, Arm, RISC-V. been under active development over last nine years since original release. In this time, there have 7500 commits codebase from 250 unique...

10.48550/arxiv.2007.03152 preprint EN cc-by arXiv (Cornell University) 2020-01-01

In a super-scalar architecture, the scheduler dynamically assigns micro-operations $( \mu$ OPs) to execution ports. The port mapping of an architecture describes how instruction decomposes into $\mu$ OPs and lists for each OP set ports it can be mapped to. It is used by compilers performance debugging tools characterize throughput sequence instructions repeatedly executed as core component loop.This paper introduces dual equivalent representation: resource abstract model where, executed,...

10.1109/cgo53902.2022.9741289 preprint EN 2022-03-29

Memory bandwidth is known to be a performance bottleneck for FPGA accelerators, especially when they deal with large multi-dimensional data-sets. A body of work focuses on reducing off-chip transfers, but few authors try improve the efficiency transfers. This paper addresses later issue by proposing (i) compiler-based approach accelerator's data layout maximize contiguous access memory, and (ii) packing runtime compression techniques that take advantage this further memory performance. We...

10.48550/arxiv.2401.12071 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program issues critical tasks to fully exploit offered by hardware resources. Current debugging approaches rely either on measuring resource utilization, order estimate which parts a CPU induce limitations, or code-based analysis deriving bottleneck information from capacity/throughput models. These limited...

10.48550/arxiv.2412.13207 preprint EN cc-by arXiv (Cornell University) 2024-12-03

In a super-scalar architecture, the scheduler dynamically assigns micro-operations ($\mu$OPs) to execution ports. The port mapping of an architecture describes how instruction decomposes into $\mu$OPs and lists for each $\mu$OP set ports it can be mapped to. It is used by compilers performance debugging tools characterize throughput sequence instructions repeatedly executed as core component loop. This paper introduces dual equivalent representation: resource abstract model where, executed,...

10.48550/arxiv.2012.11473 preprint EN other-oa arXiv (Cornell University) 2020-01-01
Coming Soon ...