NFDI4DS | UHH-SEMS - Publication Details

DS3: A System-Level Domain-Specific System-on-Chip Simulation Framework

OPENALEX - Publications

Samet E. Arda Anish NK A. Alper Goksoy Joshua Mack Nirmal Kumbhare and 4 more

Heterogeneous systems-on-chip (SoCs) are highly favorable computing platforms due to their superior performance and energy efficiency potential compared homogeneous architectures. They can be further tailored a specific domain of applications by incorporating processing elements (PEs) that accelerate frequently used kernels in these applications. However, this is contingent upon optimizing the SoC for target utilizing its resources effectively at runtime. To end, system-level design -...

10.1109/tc.2020.2986963 article EN IEEE Transactions on Computers 2020-01-01

CEDR: A Compiler-integrated, Extensible DSSoC Runtime

OPENALEX - Publications

Joshua Mack Sahil Hassan Nirmal Kumbhare Miguel C. Gonzalez Ali Akoglu

In this work, we present a C ompiler-integrated, E xtensible D omain Specific System on Chip R untime (CEDR) ecosystem to facilitate research toward addressing the challenges of architecture, system software, and application development with distinct plug-and-play integration points in unified compile time runtime workflow. We demonstrate utility CEDR Xilinx Zynq MPSoC-ZCU102 for evaluating performance pre-silicon hardware trade space SoC configuration, scheduling policy workload complexity...

10.1145/3529257 article EN ACM Transactions on Embedded Computing Systems 2022-04-13

RANC: Reconfigurable Architecture for Neuromorphic Computing

OPENALEX - Publications

Joshua Mack Ruben Purdy Kris Rockowitz Michael Inouye Edward J. Richter and 6 more

Neuromorphic architectures have been introduced as platforms for energy efficient spiking neural network execution. The massive parallelism offered by these has also triggered interest from non-machine learning application domains. In order to lift the barriers entry hardware designers and developers we present RANC: a Reconfigurable Architecture Computing, an open-source highly flexible ecosystem that enables rapid experimentation with neuromorphic in both software via C++ simulation FPGA...

10.1109/tcad.2020.3038151 article EN publisher-specific-oa IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2020-11-16

Performant, Multi-Objective Scheduling of Highly Interleaved Task Graphs on Heterogeneous System on Chip Devices

OPENALEX - Publications

Joshua Mack Samet E. Arda Ümit Y. Ogras Ali Akoglu

Performance-, power-, and energy-aware scheduling techniques play an essential role in optimally utilizing processing elements (PEs) of heterogeneous systems. List schedulers, a class low-complexity static have commonly been used execution scenarios. However, list schedulers are not suitable for runtime decision making, particularly when multiple concurrent applications interleaved dynamically. For such cases, the task times expectation idle PEs assumed by lead to inefficient system...

10.1109/tpds.2021.3135876 article EN IEEE Transactions on Parallel and Distributed Systems 2021-12-16

User-Space Emulation Framework for Domain-Specific SoC Design

OPENALEX - Publications

Joshua Mack Nirmal Kumbhare Anish Nk Ümit Y. Ogras Ali Akoglu

In this work, we propose a portable, Linux-based emulation framework to provide an ecosystem for hardware-software co-design of Domain-specific SoCs (DSSoCs) and enable their rapid evaluation during the pre-silicon design phase. This holistically targets three key challenges DSSoC design: accelerator integration, resource management, application development. We address these via flexible lightweight user-space runtime environment that enables easy integration new accelerators, scheduling...

10.1109/ipdpsw50202.2020.00016 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2020-05-01

CEDR-API: Productive, Performant Programming of Domain-Specific Embedded Systems

OPENALEX - Publications

Joshua Mack Serhan Gener Sahil Hassan H. Umut Suluhan Ali Akoglu

As the computing landscape evolves, system designers continue to explore design methodologies that leverage increased levels of heterogeneity push performance within limited size, weight, power, and cost budgets. One such methodology is build Domain-Specific System on Chips (DSSoCs) promise productivity through narrowed scope their target application domain. In previous works, we have proposed CEDR, an open source, unified compilation runtime framework for DSSoC architectures allows...

10.1109/ipdpsw59300.2023.00016 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2023-05-01

FPGA Based Emulation Environment for Neuromorphic Architectures

OPENALEX - Publications

Spencer Valancius Edward J. Richter Ruben Purdy Kris Rockowitz Michael Inouye and 5 more

Neuromorphic architectures such as IBM's TrueNorth and Intel's Loihi have been introduced platforms for energy efficient spiking neural network execution. However, there is no framework that allows rapidly experimenting with neuromorphic studying the trade space on hardware performance accuracy. Fundamentally, this creates a barrier to entry designers looking explore architectures. In paper we present an open-source FPGA based emulation environment computing research. We prototype...

10.1109/ipdpsw50202.2020.00022 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2020-05-01

DS3: A System-Level Domain-Specific System-on-Chip Simulation Framework

OPENALEX - Publications

Samet E. Arda Anish NK A. Alper Goksoy Nirmal Kumbhare Joshua Mack and 4 more

Heterogeneous systems-on-chip (SoCs) are highly favorable computing platforms due to their superior performance and energy efficiency potential compared homogeneous architectures. They can be further tailored a specific domain of applications by incorporating processing elements (PEs) that accelerate frequently used kernels in these applications. However, this is contingent upon optimizing the SoC for target utilizing its resources effectively at runtime. To end, system-level design -...

10.48550/arxiv.2003.09016 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Floating point CORDIC-based architecture for powering computation

OPENALEX - Publications

Joshua Mack Sam Bellestri Daniel Llamocca

This work presents an architecture for powering computation in floating point arithmetic that is based on expanded hyperbolic CORDIC algorithm, where the user can select 2-D domain of convergence suits their application. The fully parameterized hardware implementation allows us to explore trade-offs among design parameters (numerical format, number iterations), resource usage, accuracy, and execution time. We carry out exhaustive space exploration generate Pareto-optimal realizations...

10.1109/reconfig.2015.7393311 article EN 2015-12-01

GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures

OPENALEX - Publications

Sahil Hassan Michael Inouye Miguel C. Gonzalez Ilkin Aliyev Joshua Mack and 2 more

Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks explore design optimizations before committing silicon. Reconfigurable Architecture Neuromorphic Computing (RANC) is one such tool that offers ability execute pre-trained Spiking Neural Network (SNN) models within unified ecosystem through both software-based FPGA-based emulation. RANC has been utilized by the community with its flexible...

10.48550/arxiv.2404.16208 preprint EN arXiv (Cornell University) 2024-04-24

GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures

OPENALEX - Publications

Sahil Hassan Michael Inouye Miguel C. Gonzalez Ilkin Aliyev Joshua Mack and 2 more

Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks explore design optimizations before committing silicon. Reconfigurable Architecture Neuromorphic Computing (RANC) is one such tool that offers ability execute pre-trained Spiking Neural Network (SNN) models within unified ecosystem through both software-based FPGA-based emulation. RANC has been utilized by the community with its flexible...

10.1109/nice61972.2024.10548776 article EN 2024-04-23

A Runtime Manager Integrated Emulation Environment for Heterogeneous SoC Design with RISC-V Cores

OPENALEX - Publications

H. Umut Suluhan Serhan Gener Alexander Fusco Joshua Mack Ismet Dagli and 3 more

10.1109/ipdpsw63119.2024.00013 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2024-05-27

Tutorial: A Novel Runtime Environment for Accelerator-Rich Heterogeneous Architectures

OPENALEX - Publications

Joshua Mack Anish Krishnakumar Ümit Y. Ogras Ali Akoglu

As the landscape of computing advances, system designers are increasingly exploring methodologies that leverage higher levels heterogeneity to enhance performance within constrained size, weight, power, and cost parameters. CEDR (Compiler-integrated Extensible DSSoC Runtime) stands as an ecosystem facilitating productive efficient application development deployment across heterogeneous systems. It fosters co-design applications, scheduling heuristics, accelerators a unified framework. Our...

10.1145/3687463 article EN ACM Transactions on Embedded Computing Systems 2024-08-08

Coarse Grained Task Parallelization by Dynamic Profiling for Heterogeneous SoC Based Embedded System

OPENALEX - Publications

Liangliang Chang Serhan Gener Joshua Mack H. Umut Suluhan Ali Akoglu and 1 more

In this study, we introduce a methodology for automatically transforming user applications written in C/C++ to parallel representation consisting of coarse-grained tasks based on dynamic profiling. Such is suitable mapping onto heterogeneous SoCs. We present our approach instrumenting the application binary during compilation process with primitives that enable runtime system schedule and execute independent computation-intensive concurrently. use proposed code transformation retarget each...

10.1145/3704635 article EN ACM Transactions on Embedded Computing Systems 2024-11-15

A simulation framework for domain-specific system-on-chips

OPENALEX - Publications

Samet E. Arda Anish Nk A. Alper Goksoy Joshua Mack Nirmal Kumbhare and 4 more

Homogeneous general purpose processors provide flexibility to implement a variety of applications and facilitate programmability. In contrast, heterogeneous system-on-chips (SoCs) that combine specialized offer great potential achieve higher efficiency while maintaining programming flexibility. particular, domain-specific SoCs (DSSoC), class architectures, tailor the architecture processing elements (PE) specific domain. Hence, they can superior energy-efficiency compared by exploiting...

10.1145/3349567.3351719 article EN 2019-10-13

Design of High Throughput FPGA-Based Testbed for Accelerating Error Characterization of LDPC Codes

OPENALEX - Publications

Burak Unal Sahil Hassan Joshua Mack Nirmal Kumbhare Ali Akoglu

We present a modular FPGA-based testbed to accelerate the study of low-density parity-check codes (LDPC). This is composed controller, codeword generator, noise random number LDPC decoder, and statistical analysis modules. The decoder module replaceable enable development or new existing hard-decision-based decoders. demonstrate our testbed's ability reduce timescale error correction pattern through case studies involving Gallager B (GaB) Probabilistic (PGaB) algorithms. contextualize...

10.1109/reconfig48160.2019.8994785 article EN 2019-12-01

FALCON: An FPGA Emulation Platform for Domain-Specific SoCs (DSSoCs)

OPENALEX - Publications

Anish Krishnakumar Hanguang Yu Tutu Ajayi A. Alper Goksoy Vishrut Pandey and 10 more

This article presents FALCON, a full-system domain-specific system-onchip emulation platform that enables presilicon power and performance estimation of these platforms to provide support for early functional validation software development.

10.1109/mdat.2023.3291331 article EN IEEE Design and Test 2023-06-30

A Hardware-based HEFT Scheduler Implementation for Dynamic Workloads on Heterogeneous SoCs

OPENALEX - Publications

Alexander Fusco Sahil Hassan Joshua Mack Ali Akoglu

Non-uniform performance and power consumption across the processing elements (PEs) of heterogeneous SoCs increase computation complexity task scheduling problem compared to homogeneous architectures. Latency a software-based scheduler with increased heterogeneity level in terms number types PEs creates necessity deploying as an overlay processor hardware be able make decisions rapidly enable deployment real-life applications on SoCs. In this study we present design trade-offs involved for...

10.1109/vlsi-soc54400.2022.9939623 preprint EN 2022-10-03

Enabling Software-Defined RF Convergence with a Novel Coarse-Scale Heterogeneous Processor

OPENALEX - Publications

Daniel W. Bliss Tutu Ajayi Ali Akoglu Ilkin Aliyev Toygun Başaklar and 38 more

RF system development is traditionally constrained by a restrictive trade-off between power efficiency and programmatic flexibility. We outline path towards achieving both, thereby enabling range of new concepts that better utilize limited resources. As an example, for many future applications, we consider convergence – reusing the same spectrum waveforms to achieve multiple distributed functions goals, simultaneously. To enable this next step in processing, develop novel framework includes...

10.1109/iscas48785.2022.9937602 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2022-05-28

Accelerated Shadow Detection and Removal Method

OPENALEX - Publications

Edward J. Richter Ryan Raettig Joshua Mack Spencer Valancius Burak Unal and 1 more

Shadows can have a negative effect on the ability of computer vision techniques for object detection, tracking, and recognition. Therefore, to remove shadows byproducts illumination is an important problem enable effective recognition actions. As applications move into levels higher information extraction required processing speeds, efficient sophisticated shadow detection removal becomes even more necessary. In this study we propose method, parallelize using Tesla P100 GPU, achieve speedup...

10.1109/aiccsa47632.2019.9035242 article EN 2019-11-01

CORDIC-based Architecture for Powering Computation in Fixed-Point Arithmetic

OPENALEX - Publications

Nia Simmonds Joshua Mack Sam Bellestri Daniel Llamocca

We present a fixed point architecture (source VHDL code is provided) for powering computation. The fully customized architecture, based on the expanded hyperbolic CORDIC algorithm, allows design space exploration to establish trade-offs among parameters (numerical format, number of iterations), execution time, resource usage and accuracy. also generate Pareto-optimal realizations in resource-accuracy space: this approach can produce optimal hardware that simultaneously satisfy accuracy requirements.

10.48550/arxiv.1605.03229 preprint EN other-oa arXiv (Cornell University) 2016-01-01