Alban Dutilleul

ORCID: 0009-0004-7978-0608
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Distributed systems and fault tolerance
  • Advanced Data Storage Technologies
  • Cloud Computing and Resource Management
  • Distributed and Parallel Computing Systems
  • Software System Performance and Reliability
  • Radiation Effects in Electronics
  • Embedded Systems Design Techniques

École Normale Supérieure de Rennes
2024-2025

High-performance micro-kernels must fully exploit today's diverse and specialized hardware to deliver peak performance DNNs. While higher-level optimizations for DNNs are offered by numerous compilers (e.g., MLIR, TVM, OpenXLA), performance-critical left code generators or handwritten assembly. Even though widely-adopted LLVM, GCC) offer tuned backends, their CPU-focused input abstraction, unstructured IR, general-purpose best-effort design inhibit tailored generation innovative hardware. We...

10.1145/3696443.3708952 preprint EN 2025-02-22

A variety of code analyzers, such as IACA , uiCA llvm-mca or Ithemal strive to statically predict the throughput a computation kernel. Each analyzer is based on its own simplified CPU model reasoning at scale basic block. Facing this diversity, evaluating their strengths and weaknesses important guide both usage enhancement. We present CesASMe fully-tooled solution evaluate analyzers C-level benchmarks composed benchmark derivation procedure that feeds an evaluation harness. conclude...

10.1145/3715125 article EN ACM Transactions on Architecture and Code Optimization 2025-02-11

A variety of code analyzers, such as IACA, uiCA, llvm-mca or Ithemal, strive to statically predict the throughput a computation kernel. Each analyzer is based on its own simplified CPU model reasoning at scale basic block. Facing this diversity, evaluating their strengths and weaknesses important guide both usage enhancement. We present CesASMe, fully-tooled solution evaluate analyzers C-level benchmarks composed benchmark derivation procedure that feeds an evaluation harness. conclude...

10.48550/arxiv.2402.14567 preprint EN cc-by arXiv (Cornell University) 2024-02-22

Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program issues critical tasks to fully exploit offered by hardware resources. Current debugging approaches rely either on measuring resource utilization, order estimate which parts a CPU induce limitations, or code-based analysis deriving bottleneck information from capacity/throughput models. These limited...

10.48550/arxiv.2412.13207 preprint EN cc-by arXiv (Cornell University) 2024-12-03
Coming Soon ...