NFDI4DS | UHH-SEMS - Publication Details

The tensor algebra compiler

OPENALEX - Publications

Fredrik Kjølstad Shoaib Kamil Stephen Y. Chou David Lugato Saman Amarasinghe

Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse compound operations must frequently be computed single kernel for performance to save memory. Programmers left write kernels every operation of interest, different mixes dense tensors formats. The combinations infinite, which makes it impossible manually implement optimize them all. This paper introduces first compiler technique...

10.1145/3133901 article EN Proceedings of the ACM on Programming Languages 2017-10-12

Perspectives

OPENALEX - Publications

Gilbert Bernstein Fredrik Kjølstad

Writing highly performant simulations requires a lot of human effort to optimize for an increasingly diverse set hardware platforms, such as multi-core CPUs, GPUs, and distributed machines. Since these optimizations cut across both the design geometric data structures numerical linear algebra, code reusability portability is frequently sacrificed performance. We believe key make simulation programmers more productive at developing portable introduce new linguistic abstractions, in rendering...

10.1145/2930661 article EN ACM Transactions on Graphics 2016-05-20

Compiler Support for Sparse Tensor Computations in MLIR

OPENALEX - Publications

Aart J. C. Bik Penporn Koanantakool Tatiana Shpeisman Nicolas Vasilache Bixia Zheng and 1 more

Sparse tensors arise in problems science, engineering, machine learning, and data analytics. Programs that operate on such can exploit sparsity to reduce storage requirements computational time. Developing maintaining sparse software by hand, however, is a complex error-prone task. Therefore, we propose treating as property of tensors, not tedious implementation task, letting compiler generate code automatically from sparsity-agnostic definition the computation. This paper discusses...

10.1145/3544559 article EN ACM Transactions on Architecture and Code Optimization 2022-08-08

Simit

OPENALEX - Publications

Fredrik Kjølstad Shoaib Kamil Jonathan Ragan‐Kelley David I. W. Levin Shinjiro Sueda and 6 more

With existing programming tools, writing high-performance simulation code is labor intensive and requires sacrificing readability portability. The alternative to prototype simulations in a high-level language like Matlab, thereby performance. Matlab model naturally describes the behavior of an entire physical system using linear algebra. However, also manipulate individual geometric elements, which are best represented linked data structures meshes. Translating between algebra comes at...

10.1145/2866569 article EN ACM Transactions on Graphics 2016-03-29

Ghost Cell Pattern

OPENALEX - Publications

Fredrik Kjølstad Marc Snir

Many problems consist of a structured grid points that are updated repeatedly based on the values fixed set neighboring in same grid. To parallelize these we can geometrically divide into chunks processed by different processors. One challenge with this approach is update at periphery chunk requires from chunks. These often located remote memory belonging to processes. The naive implementation results lot time spent communication leaving less for useful computation. By using Ghost Cell...

10.1145/1953611.1953615 article EN 2010-03-30

Tensor Algebra Compilation with Workspaces

OPENALEX - Publications

Fredrik Kjølstad Willow Ahrens Shoaib Kamil Saman Amarasinghe

This paper shows how to extend sparse tensor algebra compilers introduce temporary tensors called workspaces avoid inefficient data structures accesses. We develop an intermediate representation (IR) for operations concrete index notation that specifies when sub-computations occur and where they are stored. then describe the workspace transformation in this IR, programmatically invoke it, IR is compiled code. Finally, we show can be used optimize kernels, including matrix multiplication,...

10.1109/cgo.2019.8661185 article EN 2019-02-01

Format abstraction for sparse tensor algebra compilers

OPENALEX - Publications

Stephen Y. Chou Fredrik Kjølstad Saman Amarasinghe

This paper shows how to build a sparse tensor algebra compiler that is agnostic formats (data layouts). We develop an interface describes in terms of their capabilities and properties, show modular code generator where new can be added as plugins. then describe six implementations the compose form dense, CSR/CSF, COO, DIA, ELL, HASH countless variants thereof. With these at hand, our generate compute any expression on combination aforementioned formats. To demonstrate technique, we have...

10.1145/3276493 article EN Proceedings of the ACM on Programming Languages 2018-10-24

A sparse iteration space transformation framework for sparse tensor algebra

OPENALEX - Publications

Ryan Senanayake Changwan Hong Ziheng Wang Amalee Wilson Stephen Y. Chou and 3 more

We address the problem of optimizing sparse tensor algebra in a compiler and show how to define standard loop transformations---split, collapse, reorder---on iteration spaces. The key idea is track transformation functions that map original space derived These are needed by code generator emit maps coordinates between spaces at runtime, since data structures remain space. further demonstrate can tile both universe subset nonzero coordinates: former analogous tiling dense spaces, while latter...

10.1145/3428226 article EN Proceedings of the ACM on Programming Languages 2020-11-13

DISTAL: the distributed tensor algebra compiler

OPENALEX - Publications

Rohan Yadav Alex Aiken Fredrik Kjølstad

We introduce DISTAL, a compiler for dense tensor algebra that targets modern distributed and heterogeneous systems. DISTAL lets users independently describe how tensors computation map onto target machines through separate format scheduling languages. The combination of choices data distribution creates large design space includes many algorithms from both the past (e.g., Cannon's algorithm) present COSMA). compiles domain specific language to task-based runtime system supports nodes with...

10.1145/3519939.3523437 preprint EN 2022-06-02

The Sparse Abstract Machine

OPENALEX - Publications

Olivia Hsu Maxwell Strange Ritvik Sharma Jaeyeon Won Kunle Olukotun and 3 more

We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators. SAM defines a streaming abstraction with primitives that encompass large space of scheduled expressions. graphs naturally separate formats from algorithms are expressive enough incorporate arbitrary iteration orderings many hardware-specific optimizations. also present Custard, compiler high-level language demonstrates...

10.1145/3582016.3582051 preprint EN 2023-03-20

Automatic Tracing in Task-Based Runtime Systems

OPENALEX - Publications

Rohan Yadav Michael Bauer David Broman Michael Garland Alex Aiken and 1 more

10.1145/3669940.3707237 article EN 2025-02-03

Composing Distributed Computations Through Task and Kernel Fusion

OPENALEX - Publications

Rohan Yadav Shiv Sundram Wonchan Lee Michael Garland Michael Bauer and 2 more

10.1145/3669940.3707216 article EN 2025-02-03

A Compiler for Operations on Relations with Bag Semantics

OPENALEX - Publications

James Dong Fredrik Kjølstad

We describe an abstract loop-based intermediate representation that can express fused implementations of relational algebra expressions on sets and bags (multisets). The loops are abstracted away from physical data structures thus making it easier to generate, reason about, perform optimization like fusion on. IR supports the natural as well complex operators used in production database systems, including outer joins, non-equi differences. then show how compile this efficient C++ code...

10.48550/arxiv.2502.06988 preprint EN arXiv (Cornell University) 2025-02-10

Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture

OPENALEX - Publications

Olivia Hsu Alexander Rucker Tian Zhao Varun Bhavin Desai Kunle Olukotun and 1 more

10.1145/3696443.3708918 article EN 2025-02-22

Designing Programmable Accelerators for Sparse Tensor Algebra

OPENALEX - Publications

Kalhan Koul Zhouhua Xie Maxwell Strange Srikanth Ravipati Bo Cheng and 5 more

10.1109/mm.2025.3556611 article DA IEEE Micro 2025-01-01

Taco: A tool to generate tensor algebra kernels

OPENALEX - Publications

Fredrik Kjølstad Stephen Y. Chou David Lugato Shoaib Kamil Saman Amarasinghe

Tensor algebra is an important computational abstraction that increasingly used in data analytics, machine learning, engineering, and the physical sciences. However, number of tensor expressions unbounded, which makes it hard to develop optimize libraries. Furthermore, tensors are often sparse (most components zero), means code has traverse compressed formats. To support programmers we have developed taco, a generation tool generates dense, sparse, mixed kernels from expressions. This paper...

10.1109/ase.2017.8115709 article EN 2017-10-01

Autoscheduling for sparse tensor algebra with an asymptotic cost model

OPENALEX - Publications

Willow Ahrens Fredrik Kjølstad Saman Amarasinghe

While loop reordering and fusion can make big impacts on the constant-factor performance of dense tensor programs, effects sparse programs are asymptotic, often leading to orders magnitude differences in practice. Sparse tensors also introduce a choice compressed storage formats that have asymptotic effects. Research into compilers has led simplified languages express these tradeoffs, but user is expected provide schedule makes decisions. This challenging because schedulers must anticipate...

10.1145/3519939.3523442 article EN 2022-06-02

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

OPENALEX - Publications

Kalhan Koul Jackson Melchert Kavya Sreedhar Leonard Truong Gedeon Nyengele and 28 more

With the slowing of Moore’s law, computer architects have turned to domain-specific hardware specialization continue improving performance and efficiency computing systems. However, typically entails significant modifications software stack properly leverage updated hardware. The lack a structured approach for updating compiler accelerator in tandem has impeded many attempts systematize this procedure. We propose new enable flexible evolvable based on coarse-grained reconfigurable arrays...

10.1145/3534933 article EN ACM Transactions on Embedded Computing Systems 2022-07-07

Compilation of sparse array programming models

OPENALEX - Publications

Rawn Henry Olivia Hsu Rohan Yadav Stephen Y. Chou Kunle Olukotun and 2 more

This paper shows how to compile sparse array programming languages. A language is an that supports element-wise application, reduction, and broadcasting of arbitrary functions over dense arrays with any fill value. Such a has great expressive power can express linear tensor algebra, images, exclusion inclusion filters, even graph algorithms. Our compiler strategy generalizes prior work in the literature on algebra compilation support function applied arrays, instead only addition...

10.1145/3485505 article EN Proceedings of the ACM on Programming Languages 2021-10-15

Looplets: A Language for Structured Coiteration

OPENALEX - Publications

Willow Ahrens Daniel Donenfeld Fredrik Kjølstad Saman Amarasinghe

Real world arrays often contain underlying structure, such as sparsity, runs of repeated values, or symmetry. Specializing for structure yields significant speedups. But automatically generating efficient code structured data is challenging, especially when with different interact. We show how to abstract over array structures so that the compiler can generate coiterate any combination them. Our technique enables new formats (such 1DVBL irregular clustered sparsity), iteration strategies...

10.1145/3579990.3580020 article EN cc-by 2023-02-17

Transformation for class immutability

OPENALEX - Publications

Fredrik Kjølstad Danny Dig Gabriel Acevedo Marc Snir

It is common for object-oriented programs to have both mutable and immutable classes. Immutable classes simplify programing because the programmer does not reason about side-effects. Sometimes programmers write from scratch, other times they transform into To a class, must find all methods that mutate its transitive state objects can enter or escape of class. The analyses are non-trivial rewriting tedious. Fortunately, this be automated.

10.1145/1985793.1985803 article EN 2011-05-21

Creating an Agile Hardware Design Flow

OPENALEX - Publications

Rick Bahr Clark Barrett Nikhil Bhagdikar Alex Carsello Ross Daly and 27 more

Although an agile approach is standard for software design, how to properly adapt this method hardware still open question. This work addresses question while building a system on chip (SoC) with specialized accelerators. Rather than using traditional waterfall design flow, which starts by studying the application be accelerated, we begin constructing complete flow from expressed in high-level domain-specific language (DSL), our case Halide, generic coarse-grained reconfigurable array...

10.1109/dac18072.2020.9218553 article EN 2020-07-01

MPI datatype processing using runtime compilation

OPENALEX - Publications

Timo Schneider Fredrik Kjølstad Torsten Hoefler

Data packing before and after communication can make up as much 90% of the time on modern computers. Despite MPI's well-defined datatype interface for non-contiguous data access, many codes use manual pack loops performance reasons. Programmers write access-pattern specific (e.g., do unrolling) which compilers emit optimized code. In contrast, MPI implementations in today interpret datatypes at time, resulting high overheads. this work we explore effectiveness using runtime compilation...

10.1145/2488551.2488552 article EN 2013-09-11

Tensor algebra compilation with workspaces

OPENALEX - Publications

Fredrik Kjølstad Willow Ahrens Shoaib Kamil Saman Amarasinghe

This paper shows how to extend sparse tensor algebra compilers introduce temporary tensors called workspaces avoid inefficient data structures accesses. We develop an intermediate representation (IR) for operations concrete index notation that specifies when sub-computations occur and where they are stored. then describe the workspace transformation in this IR, programmatically invoke it, IR is compiled code. Finally, we show can be used optimize kernels, including matrix multiplication,...

10.5555/3314872.3314894 article EN Symposium on Code Generation and Optimization 2019-02-16

Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators

OPENALEX - Publications

Qiaoyi Liu Jeff Setter Dillon Huff Maxwell Strange Kathleen Feng and 3 more

Image processing and machine learning applications benefit tremendously from hardware acceleration. Existing compilers target either FPGAs, which sacrifice power performance for programmability, or ASICs, become obsolete as change. Programmable domain-specific accelerators, such coarse-grained reconfigurable arrays (CGRAs), have emerged a promising middle-ground, but they traditionally been difficult compiler targets since use different memory abstraction. In contrast to CPUs GPUs, the...

10.1145/3572908 article EN ACM Transactions on Architecture and Code Optimization 2022-11-29