Nicolas Bohm Agostini

ORCID: 0000-0003-1855-3810
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Embedded Systems Design Techniques
  • Advanced Neural Network Applications
  • Advanced Memory and Neural Computing
  • Interconnection Networks and Systems
  • Machine Learning in Materials Science
  • Ferroelectric and Negative Capacitance Devices
  • CCD and CMOS Imaging Sensors
  • Numerical Methods and Algorithms
  • Robotics and Sensor-Based Localization
  • Scientific Computing and Data Management
  • Advanced Image and Video Retrieval Techniques
  • Neural Networks and Applications
  • Data Management and Algorithms
  • Distributed and Parallel Computing Systems
  • Advanced Data Storage Technologies
  • Advanced Malware Detection Techniques
  • Software Engineering Research
  • Gene Regulatory Network Analysis
  • Data Mining Algorithms and Applications
  • Visual Attention and Saliency Detection
  • Semiconductor Lasers and Optical Devices
  • Radiation Effects in Electronics
  • Data Quality and Management
  • Domain Adaptation and Few-Shot Learning

Pacific Northwest National Laboratory
2021-2025

Northeastern University
2019-2025

Universidad del Noreste
2022-2024

Weatherford College
2021

Flint Institute Of Arts
2021

Moore's Law and Dennard Scaling have guided the semiconductor industry for past few decades. Recently, both laws faced validity challenges as transistor sizes approach practical limits of physics. We are interested in testing these reflect on reasons responsible. In this work, we collect data more than 4000 publicly-available CPU GPU products. find that scaling remains critical keeping valid. However, architectural solutions become increasingly important will play a larger role future....

10.48550/arxiv.1911.11313 preprint EN other-oa arXiv (Cornell University) 2019-01-01

The generation of custom hardware accelerators for applications implemented within high-level productive programming frameworks requires considerable manual effort. To automate this process, we introduce SODA-OPT, a compiler tool that extends the MLIR infrastructure. SODA-OPT automatically searches, outlines, tiles, and pre-optimizes relevant code regions to generate high-quality through synthesis. can support any framework domain-specific language interface with By leveraging MLIR, solves...

10.1145/3508352.3549424 article EN Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design 2022-10-30

Systems performing scientific computing, data analysis, and machine learning tasks have a growing demand for application-specific accelerators that can provide high computational performance while meeting strict size power requirements. However, the algorithms applications need to be accelerated are evolving at rate is incompatible with manual design processes based on hardware description languages. Agile tools compiler techniques help by quickly producing an integrated circuit (ASIC)...

10.1109/mm.2022.3178580 article EN IEEE Micro 2022-06-01

Coarse-Grained Reconfigurable Arrays (CGRAs) can achieve higher energy-efficiency than general-purpose processors and accelerators or fine-grained reconfigurable devices, while maintaining adaptability to different computational patterns. CGRAs have shown some success as a platform accelerate machine learning (ML) thanks their flexibility, which allows them support new models not considered by fixed accelerators. However, current solutions for employ low level instruction-based compiler...

10.1109/dac56929.2023.10247873 article EN 2023-07-09

10.1145/3658617.3703315 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2025-01-20

Reconfigurable architectures are today experiencing a renewed interest for their ability to provide specialization without sacrificing the capability adapt disparate workloads. Coarse-grained reconfigurable arrays (CGRAs) higher flexibility than application-specific integrated circuits (ASICs) while offering increased hardware efficiency with respect field-programmable gate (FPGAs). This makes CGRAs promising alternative enable power-/area-efficient acceleration across different application...

10.1109/asap52443.2021.00029 article EN 2021-07-01

Graphics Processing Unit (GPU) performance has relied heavily on our ability to scale of number transistors chip, in order satisfy the ever-increasing demands for more computation. However, transistor scaling become extremely challenging, limiting that can be crammed onto a single die. Manufacturing large, fast and energy-efficient monolithic GPUs, while growing stream processing units on-chip, is no longer viable solution performance. GPU vendors are aiming exploit multi-GPU solutions,...

10.1109/ipdps.2019.00075 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2019-05-01

Recently there has been a rapidly growing demand for faster machine learning (ML) processing in data centers and migration of ML inference applications to edge devices. These developments have prompted both industry academia explore custom accelerators optimize executions performance power. However, identifying which accelerator is best equipped performing particular task challenging, especially given the range tasks, number target environments, limited integrated modeling tools. To tackle...

10.1109/sbac-pad49847.2020.00013 article EN 2020-09-01

The growing numbers of application areas for artificial intelligence (AI) methods have led to an explosion in availability domain-specific accelerators, which struggle support every new machine learning (ML) algorithm advancement, clearly highlighting the need a tool quickly and automatically transition from definition hardware implementation explore design space along variety SWaP (size, weight Power) metrics. software defined architectures (SODA) synthesizer implements modular...

10.1109/iccad51958.2021.9643474 article EN 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2021-11-01

Edge computing devices inherently face tight resource constraints, which is especially apparent when deploying Deep Neural Networks (DNN) with high memory and compute demands. FPGAs are commonly available in edge devices. Since these reconfigurable circuits can achieve higher throughput lower power consumption than general purpose processors, they well-suited for DNN acceleration. However, existing solutions designing FPGA-based accelerators come development overheads, given the cost of...

10.1109/sbac-pad53543.2021.00015 preprint EN 2021-10-01

Due to technology and power limitations, general-purpose processing units are experiencing progressively smaller performance gains. Computer architecture innovations essential keep steadily increasing. Thus domain-specific accelerators receiving renewed interest have shown benefit different scientific machine learning applications [1, 3]. High-Level-Synthesis (HLS) provides a way quickly generate hardware descriptions for starting from high-level applications. However, state-of-the-art tools...

10.1145/3528416.3530866 article EN 2022-05-05

Coarse-grained reconfigurable arrays (CGRAs) provide higher flexibility than application-specific integrated circuits (ASICs) and efficiency fine-grained devices such as Field Programmable Gate Arrays (FPGAs). However, CGRAs are generally designed to support offloading of a single kernel. While the CGRA design, based on communicating functional units, appears naturally suit data streaming applications composed multiple cooperating kernels, current approaches only statically partition...

10.1109/hpca53966.2022.00030 article EN 2022-04-01

This paper addresses the need for automatic and efficient generation of host driver code arbitrary custom AXI-based accelerators targeting linear algebra algorithms, an important workload in various applications, including machine learning scientific computing. While existing tools have focused on automating accelerator prototyping, little attention has been paid to host-accelerator interaction. introduces AXI4MLIR, extension MLIR compiler framework designed facilitate automated code. With...

10.1109/cgo57630.2024.10444801 article EN 2024-02-28

Deep Neural Networks (DNNs) have emerged as an important class of machine learning algorithms, providing accurate solutions to a broad range applications. Sparsity in activation maps DNN training presents opportunity reduce computations. However, exploiting sparsity two major challenges: i) profiling during comes with significant overhead due computing the degree and data movement; ii) dynamic nature requires dense-to-sparse conversion training, leading overhead. In this article, we present...

10.1109/tpds.2021.3067825 article EN IEEE Transactions on Parallel and Distributed Systems 2021-03-22

The Sparse Matrix-Vector Multiplication (SpMV) kernel is used in a broad class of linear algebra computations. SpMV computations result performance bottleneck many high applications, so optimizing paramount. While implementing this on GPU can potentially boost significantly, current libraries either provide modest gains or are burdened with sparse format conversion overhead. In paper we introduce the Vertical Compressed Row (VCSR) format, novel memory-aware that out-performs previous...

10.1109/tpds.2022.3177291 article EN IEEE Transactions on Parallel and Distributed Systems 2022-06-03

In this paper we propose SECDA-TFLite, a new open source toolkit for developing DNN hardware accelerators integrated within the TFLite framework.The leverages principles of SECDA, hardware/software co-design methodology, to reduce design time optimized inference on edge devices with FPGAs.With initial setup costs associated integrating accelerator target framework, allowing developers focus design.SECDA-TFLite also includes modules cost-effective SystemC simulation, profiling, and AXI-based...

10.1016/j.jpdc.2022.11.005 article EN cc-by Journal of Parallel and Distributed Computing 2022-11-15

Domain-specific designs offer greater energy efficiency and performance gain than general-purpose processors. For this reason, modern system-on-chips have a significant portion of their silicon area with custom accelerators. However, designing hardware by hand is laborious time-consuming, given the large design space performance, power, constraints that are not realized in software. Moreover, domain-specific algorithms (e.g., machine learning models) evolving quickly, challenging accelerator...

10.1109/asap52443.2021.00040 article EN 2021-07-01

Edge systems are required to autonomously make real-time decisions based on large quantities of input data under strict power, performance, area, and other constraints. Meeting these constraints is only possible by specializing through hardware accelerators purposefully built for machine learning analysis algorithms. However, science evolves at a quick pace, manual design custom has high non-recurrent engineering costs: general solutions needed automatically rapidly transition from the...

10.1109/tc.2022.3211430 article EN cc-by IEEE Transactions on Computers 2022-01-01

The Software Defined Architectures (SODA) Synthesizer is an open-source compiler-based tool able to automatically generate domain-specialized systems targeting Application-Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) starting from high-level programming. SODA composed of a frontend, SODA-OPT, which leverages the multilevel intermediate representation (MLIR) framework interface with productive programming tools (e.g., machine learning frameworks), identify...

10.1145/3566097.3568360 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2023-01-16

Download This Paper Open PDF in Browser Add to My Library Share: Permalink Using these links will ensure access this page indefinitely Copy URL DOI

10.2139/ssrn.4772671 preprint EN 2024-01-01
Coming Soon ...