Joshua Mack

ORCID: 0000-0003-1066-5578
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Embedded Systems Design Techniques
  • Distributed and Parallel Computing Systems
  • Interconnection Networks and Systems
  • Ferroelectric and Negative Capacitance Devices
  • Advanced Memory and Neural Computing
  • Real-Time Systems Scheduling
  • Cloud Computing and Resource Management
  • Numerical Methods and Algorithms
  • Low-power high-performance VLSI design
  • VLSI and Analog Circuit Testing
  • Digital Filter Design and Implementation
  • Neural Networks and Reservoir Computing
  • Image Processing Techniques and Applications
  • Optical Network Technologies
  • Advanced Data Storage Technologies
  • Advanced Algorithms and Applications
  • CCD and CMOS Imaging Sensors
  • Model Reduction and Neural Networks
  • Neuroscience and Neural Engineering
  • Wireless Signal Modulation Classification
  • Fractal and DNA sequence analysis
  • Advanced Optical Network Technologies
  • Wireless Communication Security Techniques
  • Italian Fascism and Post-war Society

University of Arizona
2015-2024

Politecnico di Milano
2022

University of Bremen
2022

University of California, Santa Barbara
2022

University of Patras
2022

Bridge University
2022

Özyeğin University
2022

Heterogeneous systems-on-chip (SoCs) are highly favorable computing platforms due to their superior performance and energy efficiency potential compared homogeneous architectures. They can be further tailored a specific domain of applications by incorporating processing elements (PEs) that accelerate frequently used kernels in these applications. However, this is contingent upon optimizing the SoC for target utilizing its resources effectively at runtime. To end, system-level design -...

10.1109/tc.2020.2986963 article EN IEEE Transactions on Computers 2020-01-01

In this work, we present a C ompiler-integrated, E xtensible D omain Specific System on Chip R untime (CEDR) ecosystem to facilitate research toward addressing the challenges of architecture, system software, and application development with distinct plug-and-play integration points in unified compile time runtime workflow. We demonstrate utility CEDR Xilinx Zynq MPSoC-ZCU102 for evaluating performance pre-silicon hardware trade space SoC configuration, scheduling policy workload complexity...

10.1145/3529257 article EN ACM Transactions on Embedded Computing Systems 2022-04-13

Neuromorphic architectures have been introduced as platforms for energy efficient spiking neural network execution. The massive parallelism offered by these has also triggered interest from non-machine learning application domains. In order to lift the barriers entry hardware designers and developers we present RANC: a Reconfigurable Architecture Computing, an open-source highly flexible ecosystem that enables rapid experimentation with neuromorphic in both software via C++ simulation FPGA...

10.1109/tcad.2020.3038151 article EN publisher-specific-oa IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2020-11-16

Performance-, power-, and energy-aware scheduling techniques play an essential role in optimally utilizing processing elements (PEs) of heterogeneous systems. List schedulers, a class low-complexity static have commonly been used execution scenarios. However, list schedulers are not suitable for runtime decision making, particularly when multiple concurrent applications interleaved dynamically. For such cases, the task times expectation idle PEs assumed by lead to inefficient system...

10.1109/tpds.2021.3135876 article EN IEEE Transactions on Parallel and Distributed Systems 2021-12-16

In this work, we propose a portable, Linux-based emulation framework to provide an ecosystem for hardware-software co-design of Domain-specific SoCs (DSSoCs) and enable their rapid evaluation during the pre-silicon design phase. This holistically targets three key challenges DSSoC design: accelerator integration, resource management, application development. We address these via flexible lightweight user-space runtime environment that enables easy integration new accelerators, scheduling...

10.1109/ipdpsw50202.2020.00016 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2020-05-01

As the computing landscape evolves, system designers continue to explore design methodologies that leverage increased levels of heterogeneity push performance within limited size, weight, power, and cost budgets. One such methodology is build Domain-Specific System on Chips (DSSoCs) promise productivity through narrowed scope their target application domain. In previous works, we have proposed CEDR, an open source, unified compilation runtime framework for DSSoC architectures allows...

10.1109/ipdpsw59300.2023.00016 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2023-05-01

Neuromorphic architectures such as IBM's TrueNorth and Intel's Loihi have been introduced platforms for energy efficient spiking neural network execution. However, there is no framework that allows rapidly experimenting with neuromorphic studying the trade space on hardware performance accuracy. Fundamentally, this creates a barrier to entry designers looking explore architectures. In paper we present an open-source FPGA based emulation environment computing research. We prototype...

10.1109/ipdpsw50202.2020.00022 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2020-05-01

Heterogeneous systems-on-chip (SoCs) are highly favorable computing platforms due to their superior performance and energy efficiency potential compared homogeneous architectures. They can be further tailored a specific domain of applications by incorporating processing elements (PEs) that accelerate frequently used kernels in these applications. However, this is contingent upon optimizing the SoC for target utilizing its resources effectively at runtime. To end, system-level design -...

10.48550/arxiv.2003.09016 preprint EN other-oa arXiv (Cornell University) 2020-01-01

This work presents an architecture for powering computation in floating point arithmetic that is based on expanded hyperbolic CORDIC algorithm, where the user can select 2-D domain of convergence suits their application. The fully parameterized hardware implementation allows us to explore trade-offs among design parameters (numerical format, number iterations), resource usage, accuracy, and execution time. We carry out exhaustive space exploration generate Pareto-optimal realizations...

10.1109/reconfig.2015.7393311 article EN 2015-12-01

Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks explore design optimizations before committing silicon. Reconfigurable Architecture Neuromorphic Computing (RANC) is one such tool that offers ability execute pre-trained Spiking Neural Network (SNN) models within unified ecosystem through both software-based FPGA-based emulation. RANC has been utilized by the community with its flexible...

10.48550/arxiv.2404.16208 preprint EN arXiv (Cornell University) 2024-04-24

Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks explore design optimizations before committing silicon. Reconfigurable Architecture Neuromorphic Computing (RANC) is one such tool that offers ability execute pre-trained Spiking Neural Network (SNN) models within unified ecosystem through both software-based FPGA-based emulation. RANC has been utilized by the community with its flexible...

10.1109/nice61972.2024.10548776 article EN 2024-04-23

As the landscape of computing advances, system designers are increasingly exploring methodologies that leverage higher levels heterogeneity to enhance performance within constrained size, weight, power, and cost parameters. CEDR (Compiler-integrated Extensible DSSoC Runtime) stands as an ecosystem facilitating productive efficient application development deployment across heterogeneous systems. It fosters co-design applications, scheduling heuristics, accelerators a unified framework. Our...

10.1145/3687463 article EN ACM Transactions on Embedded Computing Systems 2024-08-08

In this study, we introduce a methodology for automatically transforming user applications written in C/C++ to parallel representation consisting of coarse-grained tasks based on dynamic profiling. Such is suitable mapping onto heterogeneous SoCs. We present our approach instrumenting the application binary during compilation process with primitives that enable runtime system schedule and execute independent computation-intensive concurrently. use proposed code transformation retarget each...

10.1145/3704635 article EN ACM Transactions on Embedded Computing Systems 2024-11-15

Homogeneous general purpose processors provide flexibility to implement a variety of applications and facilitate programmability. In contrast, heterogeneous system-on-chips (SoCs) that combine specialized offer great potential achieve higher efficiency while maintaining programming flexibility. particular, domain-specific SoCs (DSSoC), class architectures, tailor the architecture processing elements (PE) specific domain. Hence, they can superior energy-efficiency compared by exploiting...

10.1145/3349567.3351719 article EN 2019-10-13

We present a modular FPGA-based testbed to accelerate the study of low-density parity-check codes (LDPC). This is composed controller, codeword generator, noise random number LDPC decoder, and statistical analysis modules. The decoder module replaceable enable development or new existing hard-decision-based decoders. demonstrate our testbed's ability reduce timescale error correction pattern through case studies involving Gallager B (GaB) Probabilistic (PGaB) algorithms. contextualize...

10.1109/reconfig48160.2019.8994785 article EN 2019-12-01

This article presents FALCON, a full-system domain-specific system-onchip emulation platform that enables presilicon power and performance estimation of these platforms to provide support for early functional validation software development.

10.1109/mdat.2023.3291331 article EN IEEE Design and Test 2023-06-30

Non-uniform performance and power consumption across the processing elements (PEs) of heterogeneous SoCs increase computation complexity task scheduling problem compared to homogeneous architectures. Latency a software-based scheduler with increased heterogeneity level in terms number types PEs creates necessity deploying as an overlay processor hardware be able make decisions rapidly enable deployment real-life applications on SoCs. In this study we present design trade-offs involved for...

10.1109/vlsi-soc54400.2022.9939623 preprint EN 2022-10-03

RF system development is traditionally constrained by a restrictive trade-off between power efficiency and programmatic flexibility. We outline path towards achieving both, thereby enabling range of new concepts that better utilize limited resources. As an example, for many future applications, we consider convergence – reusing the same spectrum waveforms to achieve multiple distributed functions goals, simultaneously. To enable this next step in processing, develop novel framework includes...

10.1109/iscas48785.2022.9937602 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2022-05-28

Shadows can have a negative effect on the ability of computer vision techniques for object detection, tracking, and recognition. Therefore, to remove shadows byproducts illumination is an important problem enable effective recognition actions. As applications move into levels higher information extraction required processing speeds, efficient sophisticated shadow detection removal becomes even more necessary. In this study we propose method, parallelize using Tesla P100 GPU, achieve speedup...

10.1109/aiccsa47632.2019.9035242 article EN 2019-11-01

We present a fixed point architecture (source VHDL code is provided) for powering computation. The fully customized architecture, based on the expanded hyperbolic CORDIC algorithm, allows design space exploration to establish trade-offs among parameters (numerical format, number of iterations), execution time, resource usage and accuracy. also generate Pareto-optimal realizations in resource-accuracy space: this approach can produce optimal hardware that simultaneously satisfy accuracy requirements.

10.48550/arxiv.1605.03229 preprint EN other-oa arXiv (Cornell University) 2016-01-01
Coming Soon ...