Jonathan Balkind

ORCID: 0000-0003-1443-1373
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Embedded Systems Design Techniques
  • Interconnection Networks and Systems
  • Cloud Computing and Resource Management
  • Advanced Data Storage Technologies
  • VLSI and Analog Circuit Testing
  • Semiconductor materials and devices
  • Distributed and Parallel Computing Systems
  • Distributed systems and fault tolerance
  • Low-power high-performance VLSI design
  • Advancements in Semiconductor Devices and Circuit Design
  • Advanced Malware Detection Techniques
  • Photonic and Optical Devices
  • Security and Verification in Computing
  • Graph Theory and Algorithms
  • Cryptography and Data Security
  • VLSI and FPGA Design Techniques
  • Software Engineering Research
  • Algorithms and Data Compression
  • Software System Performance and Reliability
  • Formal Methods in Verification
  • Advanced Electron Microscopy Techniques and Applications
  • Software Testing and Debugging Techniques
  • Logic, Reasoning, and Knowledge
  • Logic, programming, and type systems

University of California, Santa Barbara
2021-2025

Princeton University
2014-2020

Princeton Public Schools
2017

Serverless computing is a rapidly growing cloud application model, popularized by Amazon's Lambda platform. services provide fine-grained provisioning of resources, which scale automatically with user demand. Function-as-a-Service (FaaS) applications follow this serverless the developer providing their as set functions are executed in response to user- or system-generated event. Functions designed be short-lived and execute inside containers virtual machines, introducing range system-level...

10.1145/3352460.3358296 article EN 2019-10-11

Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these and to develop share community needs open architecture frameworks for simulation, synthesis, software exploration which support extensibility, scalability, configurability, alongside an established base verification tools supported software. In this paper we present OpenPiton, source framework...

10.1145/2872362.2872414 article EN 2016-03-25

Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these and to develop share community needs open architecture frameworks for simulation, synthesis, software exploration which support extensibility, scalability, configurability, alongside an established base verification tools supported software. In this paper we present OpenPiton, source framework...

10.1145/2954679.2872414 article EN ACM SIGPLAN Notices 2016-03-25

The end of Dennard's scaling and the looming power wall have made energy primary design goals for modern processors. Further, new applications such as cloud computing Internet Things (IoT) continue to necessitate increased performance efficiency. Manycore processors show potential in addressing some these issues. However, there is little detailed data on manycore In this work, we carefully study characteristics Piton, a 25-core open source academic processor, including voltage versus...

10.1109/hpca.2018.00070 article EN 2018-02-01

Heterogeneous architectures and heterogeneous-ISA designs are growing areas of computer architecture system software research. Unfortunately, this line research is significantly hindered by the lack experimental systems modifiable hardware frameworks. This work proposes BYOC, a "Bring Your Own Core" framework that specifically designed to enable heterogeneous BYOC an open-source provides scalable cache coherence system, includes out-of-the-box support for four different ISAs (RISC-V 32-bit,...

10.1145/3373376.3378479 article EN 2020-03-09

Modern computing systems employ significant heterogeneity and specialization to meet performance targets at manageable power. However, memory latency bottlenecks remain problematic, particularly for sparse neural network graph analytic applications where indirect accesses (IMAs) challenge the hierarchy.

10.1145/3470496.3527400 article EN 2022-05-31

Philosophically, our approaches to acceleration focus on the extreme. We must optimise accelerators maximum, leaving software fix any hardware-software mismatches. Today's abstractions for programming leak hardware details, requiring changes data formats and manual memory coherence management, among other issues. This harms generality requires deep knowledge efficiently program accelerators, a state which we consider hardware-oriented.

10.1145/3582016.3582059 article EN 2023-03-20

The shared cloud-based computing paradigm has experienced enormous growth. Multitenant clouds are conventionally built atop datacenters that utilize commodity hardware connected hierarchically with standard network protocols. Piton is a 25-core manycore processor takes different perspective, rethinking the architecture of and specializing for Infrastructure as Service (IaaS) clouds. tile-based designed not only single chip, but large-scale system. Up to 8,192 chips (204,800 cores) can be...

10.1109/mm.2017.36 article EN IEEE Micro 2017-03-01

Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these and to develop share community needs open architecture frameworks for simulation, synthesis, software exploration which support extensibility, scalability, configurability, alongside an established base verification tools supported software. In this paper we present OpenPiton, source framework...

10.1145/2980024.2872414 article EN ACM SIGARCH Computer Architecture News 2016-03-25

Computation is increasingly moving to the data enter. Thus, energy used by CPUs in centeris gaining importance. The centralization of computation center has also led much commonality between applications running there. For example, there are many instances similar or identical versions Apache web server a large center. Many these applications, such as bulk image resizing video Transco ding, favor increasing throughput over single stream performance. In this work, we propose Execution...

10.1109/micro.2014.43 article EN 2014-12-01

Embedded FPGAs (eFPGA) are increasingly being used in SoCs, enabling post-silicon hardware specialization. Existing CPU-eFPGA SoCs have three deficiencies. First, their low core count hinders efficient execution of thread-level-parallel workloads. Second, noncoherent or partially coherent integration inhibits dynamic, random memory sharing. Third, the use full-custom circuits makes proprietary eFPGAs technology-dependent, inflexible physical layout, and lacking architectural customizability.

10.1109/cicc57935.2023.10121294 article EN 2022 IEEE Custom Integrated Circuits Conference (CICC) 2023-04-01

As Moore's Law is coming to an end, heterogeneous SoCs have become ubiquitous, improving performance and efficiency with specialized hardware. However, the addition of hardware accelerators makes data supply more challenging. Feeding becomes a bottleneck, especially for data-intensive workloads such as graph analytics, sparse linear algebra, machine learning applications. DECADES addresses this issue combination accelerators, embedded FPGA (eFPGA), its unique ''intelligent storage'' (IS)...

10.1109/cicc57935.2023.10121257 article EN 2022 IEEE Custom Integrated Circuits Conference (CICC) 2023-04-01

We introduce the new problem of hardware decompilation . Analogous to software decompilation, is about analyzing a low-level artifact—in this case netlist , i.e., graph wires and logical gates representing digital circuit—in order recover higher-level programming abstractions, using those abstractions generate code written in description language (HDL). The overall requires number pieces. In paper we focus on one specific piece puzzle: technique call loop rerolling Hardware leverages clone...

10.1145/3591237 article EN Proceedings of the ACM on Programming Languages 2023-06-06

For five years, OpenPiton has provided hardware designs, build and verification scripts, other infrastructure to enable efficient, detailed research into manycores systems-on-chip. It enables open-source development through its open design support of a plethora simulators CAD tools. was first designed perform cutting-edge computer architecture at Princeton University opening it up the public led thousands downloads numerous academic publications spanning many subfields within computing. In...

10.1109/mm.2020.2997706 article EN publisher-specific-oa IEEE Micro 2020-05-26

Garbage collection greatly improves programmer productivity and ensures memory safety. Manual management on the other hand often delivers better performance but is typically unsafe can lead to system crashes or security vulnerabilities. We propose integrating safe manual with garbage in .NET runtime get best of both worlds. In our design, programmers choose between allocating objects collected heap heap. All existing applications run unmodified, without any degradation, using Our programming...

10.1145/3141879 article EN Proceedings of the ACM on Programming Languages 2017-10-12

Chips with tens of billions transistors have become today's norm. These designs are straining our electronic design automation tools throughout the process, requiring ever more computational resources. In many tools, parallelisation has improved both latency and throughput for designer's benefit. However, largely remain restricted to a single machine in case RTL simulation, we believe that this leaves much potential performance on table. We introduce Metro-MPI improve simulation modern 10...

10.23919/date56975.2023.10137080 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2023-04-01

This paper presents CIFER, the world's first opensource, fully cache-coherent, heterogeneous many-core, CPU-FPGA SoC. The 12nm, 16mm2 chip integrates four 64-bit, OS-capable, RISC-V application cores; three TinyCore clusters that each contain six 32-bit, compute cores (18 in total); and an EDA-synthesized, standard-cell-based eFPGA. CIFER enables decomposition of real-world applications tailored execution (parallelization or specialization) per decomposed task. Our evaluation shows that: 1)...

10.1109/lssc.2023.3303111 article EN IEEE Solid-State Circuits Letters 2023-01-01

Energy efficiency has become an increasingly important concern in computer architecture due to the end of Dennard scaling. Heterogeneity been explored as a way achieve better energy and heterogeneous microarchitecture chips have common mobile setting. Recent research using heterogeneous-ISA, microarchitecture, general-purpose cores further gains. However, there is no open-source hardware implementation heterogeneous-ISA processor available for research, effective on processors necessitates...

10.1145/3289602.3293958 article EN 2019-02-20

Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these and to develop share community needs open architecture frameworks for simulation, synthesis, software exploration which support extensibility, scalability, configurability, alongside an established base verification tools supported software. In this paper we present OpenPiton, source framework...

10.1145/2954680.2872414 article EN ACM SIGOPS Operating Systems Review 2016-03-25

Effective digital hardware design fundamentally requires decomposing a into set of interconnected modules, each distinct unit computation and state. However, naively connecting modules leads to real-world pathological cases which are surprisingly far from obvious when looking at the interfaces alone very difficult debug after synthesis. We show for first time that it is possible soundly abstract even complex combinational dependencies arbitrary through assignment IO ports one four new sorts...

10.1145/3453483.3454037 article EN 2021-06-18

To better facilitate application performance programming we propose a software optimization strategy enabled by novel low-latency Prediction System Service (PSS). Rather than relying on nuanced domain-specific knowledge or slapdash heuristics, system service for prediction encourages programmers to spend their time uncovering new levers rather worrying about the details of control. The core idea is write optimizations that improve in specific cases, under tunings, and leave decision how when...

10.1145/3575693.3575714 article EN 2023-01-27

EDA toolchains are notoriously unpredictable, incomplete, and error-prone; the generally-accepted remedy has been to re-imagine tasks as compilation problems. However, any compiler framework we apply must be prepared handle wide range of tasks, including not only like technology mapping optimization (the "there"} in our title), but also decompilation loop rerolling "back again"). In this paper, advocate for equality saturation -- a term rewriting choice when building hardware toolchains....

10.48550/arxiv.2404.00786 preprint EN arXiv (Cornell University) 2024-03-31

State-of-the-art domain specific architectures (DSAs) work with sparse data, and need hardware support for index data-structures [31, 43, 57, 61]. Indexes are more space-efficient sparse-data, reduce DRAM bandwidth, if data reuse can be managed. However, indexes exhibit dynamic accesses, chase pointers, to walk-and-search. This inflates the working set thrashes cache. We observe that cache organization itself is responsible this behavior.

10.1145/3620665.3640402 article EN 2024-04-22

TL simulation has become a crucial bottleneck in the design of emerging SoCs for AI. To clear this bottleneck, teams are leaning ever more heavily on emulation and other alternative tools. We find that designer can instead exploit natural boundaries these order to parallelise their RTL simulations using HPC techniques. By distributing Verilog across tens nodes (and thousands physical cores), we simulate 10B+ transistor, 1024 core SoC with over 2.7MIPS aggregate throughput simulated cores....

10.1109/vlsitsa60681.2024.10546385 article EN 2024-04-22
Coming Soon ...