NFDI4DS | UHH-SEMS - Publication Details

Architectural Implications of Function-as-a-Service Computing

OPENALEX - Publications

Mohammad Shahrad Jonathan Balkind David Wentzlaff

Serverless computing is a rapidly growing cloud application model, popularized by Amazon's Lambda platform. services provide fine-grained provisioning of resources, which scale automatically with user demand. Function-as-a-Service (FaaS) applications follow this serverless the developer providing their as set functions are executed in response to user- or system-generated event. Functions designed be short-lived and execute inside containers virtual machines, introducing range system-level...

10.1145/3352460.3358296 article EN 2019-10-11

OpenPiton

OPENALEX - Publications

Jonathan Balkind Michael McKeown Yaosheng Fu Tri Minh Nguyen Yanqi Zhou and 7 more

Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these and to develop share community needs open architecture frameworks for simulation, synthesis, software exploration which support extensibility, scalability, configurability, alongside an established base verification tools supported software. In this paper we present OpenPiton, source framework...

10.1145/2872362.2872414 article EN 2016-03-25

Stramash: A Fused-Kernel Operating System For Cache-Coherent, Heterogeneous-ISA Platforms

OPENALEX - Publications

Tong Xing Cong Xiong Tianrui Wei Andrés Fernando Pinzón Sanchéz Binoy Ravindran and 2 more

10.1145/3676641.3716275 article EN 2025-03-27

OpenPiton

OPENALEX - Publications

Jonathan Balkind Michael McKeown Yaosheng Fu Tri Minh Nguyen Yanqi Zhou and 7 more

Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these and to develop share community needs open architecture frameworks for simulation, synthesis, software exploration which support extensibility, scalability, configurability, alongside an established base verification tools supported software. In this paper we present OpenPiton, source framework...

10.1145/2954679.2872414 article EN ACM SIGPLAN Notices 2016-03-25

Power and Energy Characterization of an Open Source 25-Core Manycore Processor

OPENALEX - Publications

Michael McKeown Alexey Lavrov Mohammad Shahrad P.J. Jackson Yaosheng Fu and 5 more

The end of Dennard's scaling and the looming power wall have made energy primary design goals for modern processors. Further, new applications such as cloud computing Internet Things (IoT) continue to necessitate increased performance efficiency. Manycore processors show potential in addressing some these issues. However, there is little detailed data on manycore In this work, we carefully study characteristics Piton, a 25-core open source academic processor, including voltage versus...

10.1109/hpca.2018.00070 article EN 2018-02-01

BYOC

OPENALEX - Publications

Jonathan Balkind Katie Lim Michael Schaffner Fei Gao Grigory Chirkov and 8 more

Heterogeneous architectures and heterogeneous-ISA designs are growing areas of computer architecture system software research. Unfortunately, this line research is significantly hindered by the lack experimental systems modifiable hardware frameworks. This work proposes BYOC, a "Bring Your Own Core" framework that specifically designed to enable heterogeneous BYOC an open-source provides scalable cache coherence system, includes out-of-the-box support for four different ISAs (RISC-V 32-bit,...

10.1145/3373376.3378479 article EN 2020-03-09

Tiny but mighty

OPENALEX - Publications

Marcelo Orenes-Vera Aninda Manocha Jonathan Balkind Fei Gao Juan L. Aragón and 2 more

Modern computing systems employ significant heterogeneity and specialization to meet performance targets at manageable power. However, memory latency bottlenecks remain problematic, particularly for sparse neural network graph analytic applications where indirect accesses (IMAs) challenge the hierarchy.

10.1145/3470496.3527400 article EN 2022-05-31

Cohort: Software-Oriented Acceleration for Heterogeneous SoCs

OPENALEX - Publications

Tianrui Wei Nazerke Turtayeva Marcelo Orenes-Vera Omkar Lonkar Jonathan Balkind

Philosophically, our approaches to acceleration focus on the extreme. We must optimise accelerators maximum, leaving software fix any hardware-software mismatches. Today's abstractions for programming leak hardware details, requiring changes data formats and manual memory coherence management, among other issues. This harms generality requires deep knowledge efficiently program accelerators, a state which we consider hardware-oriented.

10.1145/3582016.3582059 article EN 2023-03-20

Piton: A Manycore Processor for Multitenant Clouds

OPENALEX - Publications

Michael McKeown Yaosheng Fu Tri Minh Nguyen Yanqi Zhou Jonathan Balkind and 4 more

The shared cloud-based computing paradigm has experienced enormous growth. Multitenant clouds are conventionally built atop datacenters that utilize commodity hardware connected hierarchically with standard network protocols. Piton is a 25-core manycore processor takes different perspective, rethinking the architecture of and specializing for Infrastructure as Service (IaaS) clouds. tile-based designed not only single chip, but large-scale system. Up to 8,192 chips (204,800 cores) can be...

10.1109/mm.2017.36 article EN IEEE Micro 2017-03-01

OpenPiton

OPENALEX - Publications

Jonathan Balkind Michael McKeown Yaosheng Fu Tri Minh Nguyen Yanqi Zhou and 7 more

Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these and to develop share community needs open architecture frameworks for simulation, synthesis, software exploration which support extensibility, scalability, configurability, alongside an established base verification tools supported software. In this paper we present OpenPiton, source framework...

10.1145/2980024.2872414 article EN ACM SIGARCH Computer Architecture News 2016-03-25

Execution Drafting: Energy Efficiency through Computation Deduplication

OPENALEX - Publications

Michael McKeown Jonathan Balkind David Wentzlaff

Computation is increasingly moving to the data enter. Thus, energy used by CPUs in centeris gaining importance. The centralization of computation center has also led much commonality between applications running there. For example, there are many instances similar or identical versions Apache web server a large center. Many these applications, such as bulk image resizing video Transco ding, favor increasing throughput over single stream performance. In this work, we propose Execution...

10.1109/micro.2014.43 article EN 2014-12-01

CIFER: A 12nm, 16mm2, 22-Core SoC with a 1541 LUT6/mm2 1.92 MOPS/LUT, Fully Synthesizable, CacheCoherent, Embedded FPGA

OPENALEX - Publications

Ting-Jung Chang Ang Li Fei Gao Tuan Ta Georgios Tziantzioulis and 14 more

Embedded FPGAs (eFPGA) are increasingly being used in SoCs, enabling post-silicon hardware specialization. Existing CPU-eFPGA SoCs have three deficiencies. First, their low core count hinders efficient execution of thread-level-parallel workloads. Second, noncoherent or partially coherent integration inhibits dynamic, random memory sharing. Third, the use full-custom circuits makes proprietary eFPGAs technology-dependent, inflexible physical layout, and lacking architectural customizability.

10.1109/cicc57935.2023.10121294 article EN 2022 IEEE Custom Integrated Circuits Conference (CICC) 2023-04-01

DECADES: A 67mm2, 1.46TOPS, 55 Giga Cache-Coherent 64-bit RISC-V Instructions per second, Heterogeneous Manycore SoC with 109 Tiles including Accelerators, Intelligent Storage, and eFPGA in 12nm FinFET

OPENALEX - Publications

Fei Gao Ting-Jung Chang Ang Li Marcelo Orenes-Vera Davide Giri and 12 more

As Moore's Law is coming to an end, heterogeneous SoCs have become ubiquitous, improving performance and efficiency with specialized hardware. However, the addition of hardware accelerators makes data supply more challenging. Feeding becomes a bottleneck, especially for data-intensive workloads such as graph analytics, sparse linear algebra, machine learning applications. DECADES addresses this issue combination accelerators, embedded FPGA (eFPGA), its unique ''intelligent storage'' (IS)...

10.1109/cicc57935.2023.10121257 article EN 2022 IEEE Custom Integrated Circuits Conference (CICC) 2023-04-01

Loop Rerolling for Hardware Decompilation

OPENALEX - Publications

Zachary D. Sisco Jonathan Balkind Timothy Sherwood Ben Hardekopf

We introduce the new problem of hardware decompilation . Analogous to software decompilation, is about analyzing a low-level artifact—in this case netlist , i.e., graph wires and logical gates representing digital circuit—in order recover higher-level programming abstractions, using those abstractions generate code written in description language (HDL). The overall requires number pieces. In paper we focus on one specific piece puzzle: technique call loop rerolling Hardware leverages clone...

10.1145/3591237 article EN Proceedings of the ACM on Programming Languages 2023-06-06

OpenPiton at 5: A Nexus for Open and Agile Hardware Design

OPENALEX - Publications

Jonathan Balkind Ting-Jung Chang P.J. Jackson Georgios Tziantzioulis Ang Li and 6 more

For five years, OpenPiton has provided hardware designs, build and verification scripts, other infrastructure to enable efficient, detailed research into manycores systems-on-chip. It enables open-source development through its open design support of a plethora simulators CAD tools. was first designed perform cutting-edge computer architecture at Princeton University opening it up the public led thousands downloads numerous academic publications spanning many subfields within computing. In...

10.1109/mm.2020.2997706 article EN publisher-specific-oa IEEE Micro 2020-05-26

Project snowflake: non-blocking safe manual memory management in .NET

OPENALEX - Publications

Matthew Parkinson Dimitrios Vytiniotis Kapil Vaswani Manuel Costa Pantazis Deligiannis and 3 more

Garbage collection greatly improves programmer productivity and ensures memory safety. Manual management on the other hand often delivers better performance but is typically unsafe can lead to system crashes or security vulnerabilities. We propose integrating safe manual with garbage in .NET runtime get best of both worlds. In our design, programmers choose between allocating objects collected heap heap. All existing applications run unmodified, without any degradation, using Our programming...

10.1145/3141879 article EN Proceedings of the ACM on Programming Languages 2017-10-12

Fast Behavioural RTL Simulation of 10B Transistor SoC Designs with Metro-Mpi

OPENALEX - Publications

Guillem López-Paradı́s Brian Li Adrià Armejach Stefan Wallentowitz Miquel Moretó and 1 more

Chips with tens of billions transistors have become today's norm. These designs are straining our electronic design automation tools throughout the process, requiring ever more computational resources. In many tools, parallelisation has improved both latency and throughput for designer's benefit. However, largely remain restricted to a single machine in case RTL simulation, we believe that this leaves much potential performance on table. We introduce Metro-MPI improve simulation modern 10...

10.23919/date56975.2023.10137080 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2023-04-01

CIFER: A Cache-Coherent 12-nm 16-mm2 SoC With Four 64-Bit RISC-V Application Cores, 18 32-Bit RISC-V Compute Cores, and a 1541 LUT6/mm2 Synthesizable eFPGA

OPENALEX - Publications

Ang Li Ting-Jung Chang Fei Gao Tuan Ta Georgios Tziantzioulis and 14 more

This paper presents CIFER, the world's first opensource, fully cache-coherent, heterogeneous many-core, CPU-FPGA SoC. The 12nm, 16mm2 chip integrates four 64-bit, OS-capable, RISC-V application cores; three TinyCore clusters that each contain six 32-bit, compute cores (18 in total); and an EDA-synthesized, standard-cell-based eFPGA. CIFER enables decomposition of real-world applications tailored execution (parallelization or specialization) per decomposed task. Our evaluation shows that: 1)...

10.1109/lssc.2023.3303111 article EN IEEE Solid-State Circuits Letters 2023-01-01

JuxtaPiton

OPENALEX - Publications

Katie Lim Jonathan Balkind David Wentzlaff

Energy efficiency has become an increasingly important concern in computer architecture due to the end of Dennard scaling. Heterogeneity been explored as a way achieve better energy and heterogeneous microarchitecture chips have common mobile setting. Recent research using heterogeneous-ISA, microarchitecture, general-purpose cores further gains. However, there is no open-source hardware implementation heterogeneous-ISA processor available for research, effective on processors necessitates...

10.1145/3289602.3293958 article EN 2019-02-20

OpenPiton

OPENALEX - Publications

Jonathan Balkind Xiaohua Liang Matthew Matl David Wentzlaff Michael McKeown and 7 more

Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these and to develop share community needs open architecture frameworks for simulation, synthesis, software exploration which support extensibility, scalability, configurability, alongside an established base verification tools supported software. In this paper we present OpenPiton, source framework...

10.1145/2954680.2872414 article EN ACM SIGOPS Operating Systems Review 2016-03-25

Wire sorts: a language abstraction for safe hardware composition

OPENALEX - Publications

Michael Christensen Timothy Sherwood Jonathan Balkind Ben Hardekopf

Effective digital hardware design fundamentally requires decomposing a into set of interconnected modules, each distinct unit computation and state. However, naively connecting modules leads to real-world pathological cases which are surprisingly far from obvious when looking at the interfaces alone very difficult debug after synthesis. We show for first time that it is possible soundly abstract even complex combinational dependencies arbitrary through assignment IO ports one four new sorts...

10.1145/3453483.3454037 article EN 2021-06-18

A Prediction System Service

OPENALEX - Publications

Zhizhou Zhang Alvin Oliver Glova Timothy Sherwood Jonathan Balkind

To better facilitate application performance programming we propose a software optimization strategy enabled by novel low-latency Prediction System Service (PSS). Rather than relying on nuanced domain-specific knowledge or slapdash heuristics, system service for prediction encourages programmers to spend their time uncovering new levers rather worrying about the details of control. The core idea is write optimizations that improve in specific cases, under tunings, and leave decision how when...

10.1145/3575693.3575714 article EN 2023-01-27

There and Back Again: A Netlist's Tale with Much Egraphin'

OPENALEX - Publications

Gus Henry Smith Zachary D. Sisco Thanawat Techaumnuaiwit Jingtao Xia Vishal Canumalla and 4 more

EDA toolchains are notoriously unpredictable, incomplete, and error-prone; the generally-accepted remedy has been to re-imagine tasks as compilation problems. However, any compiler framework we apply must be prepared handle wide range of tasks, including not only like technology mapping optimization (the "there"} in our title), but also decompilation loop rerolling "back again"). In this paper, advocate for equality saturation -- a term rewriting choice when building hardware toolchains....

10.48550/arxiv.2404.00786 preprint EN arXiv (Cornell University) 2024-03-31

METAL: Caching Multi-level Indexes in Domain-Specific Architectures

OPENALEX - Publications

Anagha Molakalmur Anil Kumar Aditya Prasanna Jonathan Balkind Arrvindh Shriraman

State-of-the-art domain specific architectures (DSAs) work with sparse data, and need hardware support for index data-structures [31, 43, 57, 61]. Indexes are more space-efficient sparse-data, reduce DRAM bandwidth, if data reuse can be managed. However, indexes exhibit dynamic accesses, chase pointers, to walk-and-search. This inflates the working set thrashes cache. We observe that cache organization itself is responsible this behavior.

10.1145/3620665.3640402 article EN 2024-04-22

Exploiting HPC Techniques to Parallelise Simulation of 10B+ Transistor SoCs

OPENALEX - Publications

Jonathan Balkind

TL simulation has become a crucial bottleneck in the design of emerging SoCs for AI. To clear this bottleneck, teams are leaning ever more heavily on emulation and other alternative tools. We find that designer can instead exploit natural boundaries these order to parallelise their RTL simulations using HPC techniques. By distributing Verilog across tens nodes (and thousands physical cores), we simulate 10B+ transistor, 1024 core SoC with over 2.7MIPS aggregate throughput simulated cores....

10.1109/vlsitsa60681.2024.10546385 article EN 2024-04-22