NFDI4DS | UHH-SEMS - Publication Details

Tobias Grosser

ORCID: 0000-0003-3874-6003

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5055618037

Research Areas

Parallel Computing and Optimization Techniques
Embedded Systems Design Techniques
Logic, programming, and type systems
Advanced Data Storage Technologies
Distributed and Parallel Computing Systems
Formal Methods in Verification
Cloud Computing and Resource Management
Distributed systems and fault tolerance
Advanced Memory and Neural Computing
Software Testing and Debugging Techniques
Software System Performance and Reliability
Ferroelectric and Negative Capacitance Devices
VLSI and Analog Circuit Testing
Interconnection Networks and Systems
Model-Driven Software Engineering Techniques
Quantum Computing Algorithms and Architecture
Numerical Methods and Algorithms
Cellular Automata and Applications
Software Engineering Research
Advanced Electron Microscopy Techniques and Applications
Scientific Computing and Data Management
Security and Verification in Computing
Integrated Circuits and Semiconductor Failure Analysis
Algorithms and Data Compression
Tensor decomposition and applications

University of Cambridge
2024-2025

University of Edinburgh
2020-2024

Edinburgh College
2024

Lawrence Livermore National Laboratory
2023

ETH Zurich
2015-2020

Board of the Swiss Federal Institutes of Technology
2018

Institut national de recherche en informatique et en automatique
2012-2015

École Normale Supérieure - PSL
2012-2015

École Normale Supérieure
2013-2014

Laboratoire de l'Informatique du Parallélisme
2013-2014

POLLY — PERFORMING POLYHEDRAL OPTIMIZATIONS ON A LOW-LEVEL INTERMEDIATE REPRESENTATION

OPENALEX - Publications

Tobias Grosser ARMIN GROESSLINGER Christian Lengauer

The polyhedral model for loop parallelization has proved to be an effective tool advanced optimization and automatic of programs in higher-level languages. Yet, integrate such optimizations seamlessly into production compilers, they must performed on the compiler's internal, low-level, intermediate representation (IR). With Polly, we present infrastructure IR. We describe detection program parts amenable a (so-called static control parts), their translation Z-polyhedral representation, this...

10.1142/s0129626412500107 article EN Parallel Processing Letters 2012-12-01

PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming

OPENALEX - Publications

Riyadh Baghdadi Ulysse Beaugnon Albert Cohen Tobias Grosser Michael Kruse and 11 more

Programming accelerators such as GPUs with low-level APIs and languages OpenCL CUDA is difficult, error-prone, not performance-portable. Automatic parallelization domain specific (DSLs) have been proposed to hide complexity regain performance portability. We present PENCIL, a rigorously-defined subset of GNU C99-enriched additional language constructs-that enables compilers exploit parallelism produce highly optimized code when targeting accelerators. PENCIL aims serve both portable...

10.1109/pact.2015.17 preprint EN 2015-10-01

Split tiling for GPUs

OPENALEX - Publications

Tobias Grosser Albert Cohen Paul H. J. Kelly J. Ramanujam P. Sadayappan and 1 more

Tiling is a key technique to enhance data reuse. For computations structured as one sequential outer "time" loop enclosing set of parallel inner loops, tiling only the loops may not enable enough reuse in cache. along with time enhances locality but require other transformations like skewing that inhibit inter-tile parallelism.

10.1145/2458523.2458526 article EN 2013-03-16

Hybrid Hexagonal/Classical Tiling for GPUs

OPENALEX - Publications

Tobias Grosser Albert Cohen Justin Holewinski P. Sadayappan Sven Verdoolaege

Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical hyper-rectangular tiles cannot be used due to combination backward and forward dependences along space dimensions. Existing techniques trade temporal data reuse inefficiencies in other areas, such as load imbalance, redundant computations, or increased control flow overhead, therefore making it challenging use with GPUs.

10.1145/2544137.2544160 preprint EN 2014-02-15

Domain-Specific Multi-Level IR Rewriting for GPU

OPENALEX - Publications

Tobias Gysi Christoph Müller Oleksandr Zinenko Stephan Herhut Eddie C. Davis and 4 more

Most compilers have a single core intermediate representation (IR) (e.g., LLVM) sometimes complemented with vaguely defined IR-like data structures. This IR is commonly low-level and close to machine instructions. As result, optimizations relying on domain-specific information are either not possible or require complex analysis recover the missing information. In contrast, multi-level rewriting instantiates hierarchy of dialects (IRs), lowers programs level-by-level, performs code...

10.1145/3469030 article EN ACM Transactions on Architecture and Code Optimization 2021-09-03

Polyhedral AST Generation Is More Than Scanning Polyhedra

OPENALEX - Publications

Tobias Grosser Sven Verdoolaege Albert Cohen

Abstract mathematical representations such as integer polyhedra have been shown to be useful precisely analyze computational kernels and express complex loop transformations. Such transformations rely on abstract syntax tree (AST) generators convert the representation back an imperative program. generic AST avoid need resort transformation-specific code generators, which may very costly or technically difficult develop become more complex. Existing proven their effectiveness, but they hit...

10.1145/2743016 article EN ACM Transactions on Programming Languages and Systems 2015-07-15

Hybrid Hexagonal/Classical Tiling for GPUs

OPENALEX - Publications

Tobias Grosser Albert Cohen Justin Holewinski P. Sadayappan Sven Verdoolaege

10.1145/2581122.2544160 preprint EN 2014-02-15

MODESTO

OPENALEX - Publications

Tobias Gysi Tobias Grosser Torsten Hoefler

Code transformations, such as loop tiling and fusion, are of key importance for the efficient implementation stencil computations. However, their direct application to a large code base is costly severely impacts program maintainability. While recently introduced domain-specific languages facilitate they typically still require manual tuning or auto-tuning techniques select transformations that yield optimal performance. In this paper, we introduce MODESTO, model-driven optimization...

10.1145/2751205.2751223 article EN 2015-06-02

A framework for enhancing data reuse via associative reordering

OPENALEX - Publications

Kevin Stock Martin Kong Tobias Grosser Louis-Noël Pouchet Fabrice Rastello and 2 more

The freedom to reorder computations involving associative operators has been widely recognized and exploited in designing parallel algorithms a more limited extent optimizing compilers.

10.1145/2594291.2594342 preprint EN 2014-05-13

Polly-ACC Transparent compilation to heterogeneous hardware

OPENALEX - Publications

Tobias Grosser Torsten Hoefler

Programming today's increasingly complex heterogeneous hardware is difficult, as it commonly requires the use of data-parallel languages, pragma annotations, specialized libraries, or DSL compilers. Adding explicit accelerator support into a larger code base not only costly, but also introduces additional complexity that hinders long-term maintenance. We propose new compiler brings us closer to dream automatic mapping. Starting from sequential IR, we automatically generate hybrid executable...

10.1145/2925426.2926286 article EN 2016-06-01

The Relation Between Diamond Tiling and Hexagonal Tiling

OPENALEX - Publications

Tobias Grosser Sven Verdoolaege Albert Cohen P. Sadayappan

Iterative stencil computations are important in scientific computing and more also the embedded mobile domain. Recent publications have shown that tiling schemes ensure concurrent start provide efficient ways to execute these kernels. Diamond hybrid-hexagonal two enable start. Both different advantages: diamond has been integrated a general purpose optimization framework uses cost function choose among hyperplanes, whereas greater flexibility with tile sizes for exploited effective...

10.1142/s0129626414410023 article EN Parallel Processing Letters 2014-09-01

Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling

OPENALEX - Publications

Oleksandr Zinenko Sven Verdoolaege Chandan Reddy Jun Shirako Tobias Grosser and 2 more

The construction of effective loop nest optimizers and parallelizers remains challenging despite decades work in the area. Due to increasing diversity loop-intensive applications complex memory/computation hierarchies modern processors, optimization heuristics are pulled towards conflicting goals, highlighting lack a systematic approach optimizing locality parallelism. Acknowledging these demands on optimization, we propose an algorithmic template capable modeling multi-level parallelism...

10.1145/3178372.3179507 preprint EN 2018-02-21

High-Performance Generalized Tensor Operations

OPENALEX - Publications

Roman A. Gareev Tobias Grosser Michael Kruse

The efficiency of tensor contraction is great importance. Compilers cannot optimize it well enough to come close the performance expert-tuned implementations. All existing approaches that provide competitive require optimized external code. We introduce a compiler optimization reaches BLAS libraries without need for an implementation or automatic tuning. Our approach provides across hardware architectures and can be generalized deliver same benefits algebraic path problems. By making fast...

10.1145/3235029 article EN ACM Transactions on Architecture and Code Optimization 2018-09-04

A fast analytical model of fully associative caches

OPENALEX - Publications

Tobias Gysi Tobias Grosser Laurin Brandner Torsten Hoefler

While the cost of computation is an easy to understand local property, data movement on cached architectures depends global state, does not compose, and hard predict. As a result, programmers often fail consider movement. Existing cache models simulators provide missing information but are computationally expensive. We present lightweight model for fully associative caches with least recently used (LRU) replacement policy that gives fast accurate results. count misses without explicit...

10.1145/3314221.3314606 preprint EN 2019-06-07

LLHD: a multi-level intermediate representation for hardware description languages

OPENALEX - Publications

Fabian Schuiki Andreas Kurth Tobias Grosser Luca Benini

Modern Hardware Description Languages (HDLs) such as SystemVerilog or VHDL are, due to their sheer complexity, insufficient transport designs through modern circuit design flows. Instead, each automation tool lowers HDLs its own Intermediate Representation (IR). These tools are monolithic and mostly proprietary, disagree in implementation of HDLs, while many redundant IRs exists, no IR today can be used the entire flow. To solve this problem, we propose LLHD multi-level IR. is designed...

10.1145/3385412.3386024 article EN 2020-06-07

Understanding and exploiting optimal function inlining

OPENALEX - Publications

Theodoros Theodoridis Tobias Grosser Zhendong Su

Inlining is a core transformation in optimizing compilers. It replaces function call (call site) with the body of called (callee). helps reduce overhead and binary size, more importantly, enables other optimizations. The problem inlining has been extensively studied, but it far from being solved; predicting which decisions are beneficial nontrivial due to interactions rest compiler pipeline. Previous work mainly focused on designing heuristics for better not investigated optimal inlining,...

10.1145/3503222.3507744 article EN 2022-02-22

A Multi-level Compiler Backend for Accelerated Micro-kernels Targeting RISC-V ISA Extensions

OPENALEX - Publications

Alexandre Lopoukhine Federico Ficarelli Christos Vasiladiotis Anton Lydike Josse Van Delm and 4 more

High-performance micro-kernels must fully exploit today's diverse and specialized hardware to deliver peak performance DNNs. While higher-level optimizations for DNNs are offered by numerous compilers (e.g., MLIR, TVM, OpenXLA), performance-critical left code generators or handwritten assembly. Even though widely-adopted LLVM, GCC) offer tuned backends, their CPU-focused input abstraction, unstructured IR, general-purpose best-effort design inhibit tailored generation innovative hardware. We...

10.1145/3696443.3708952 preprint EN 2025-02-22

xDSL: Sidekick Compilation for SSA-Based Compilers

OPENALEX - Publications

Mathieu Fehr Michel Weber Christian Ulmann Alexandre Lopoukhine Martin Paul Lücke and 4 more

10.1145/3696443.3708945 article EN 2025-02-22

Compressed and Parallelized Structured Tensor Algebra

OPENALEX - Publications

Mahdi Ghorbani Emilien Bauer Tobias Grosser Amir Shaikhha

Tensor algebra is a crucial component for data-intensive workloads such as machine learning and scientific computing. As the complexity of data grows, scientists often encounter dilemma between highly specialized dense tensor efficient structure-aware algorithms provided by sparse algebra. In this paper, we introduce DASTAC, framework to propagate tensors's captured high-level structure down low-level code generation incorporating techniques automatic layout compression, polyhedral analysis,...

10.1145/3720506 article EN Proceedings of the ACM on Programming Languages 2025-04-09

A framework for enhancing data reuse via associative reordering

OPENALEX - Publications

Kevin Stock Martin Kong Tobias Grosser Louis-Noël Pouchet Fabrice Rastello and 2 more

The freedom to reorder computations involving associative operators has been widely recognized and exploited in designing parallel algorithms a more limited extent optimizing compilers. In this paper, we develop novel framework utilizing the associativity commutativity of operations regular loop enhance register reuse. Stencils represent particular class important where optimization can be applied performance. We show how stencil implemented better exploit reuse reduce load/stores....

10.1145/2666356.2594342 article EN ACM SIGPLAN Notices 2014-06-05

Runtime pointer disambiguation

OPENALEX - Publications

Péricles Alves Fabian M. Gruber Johannes Doerfert Alexandros Lamprineas Tobias Grosser and 2 more

To optimize code effectively, compilers must deal with memory dependencies. However, the state-of-the-art heuristics available in literature to track dependencies are inherently imprecise and computationally expensive. Consequently, most advanced transformations that have today ineffective when applied on real-world programs. The goal of this paper is solve conundrum through dynamic disambiguation pointers. We provide different ways determine at runtime two locations can overlap. then...

10.1145/2814270.2814285 preprint EN 2015-10-23

Optimistic Delinearization of Parametrically Sized Arrays

OPENALEX - Publications

Tobias Grosser J. Ramanujam Louis-Noël Pouchet P. Sadayappan Sebastian Pop

A number of legacy codes make use linearized array references (i.e., to one-dimensional arrays) encode accesses multi-dimensional arrays. This is also true a optimized libraries and the well-known LLVM intermediate representation, which linearize accesses. In many cases, only information available an base pointer single dimensional offset. For problems with parametric extents, this offset usually multivariate polynomial. Compiler analyses such as data dependence analysis are impeded because...

10.1145/2751205.2751248 article EN 2015-06-02

Coming Soon ...