NFDI4DS | UHH-SEMS - Publication Details

Nicolas Bohm Agostini

ORCID: 0000-0003-1855-3810

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5057498242

Research Areas

Parallel Computing and Optimization Techniques
Embedded Systems Design Techniques
Advanced Neural Network Applications
Advanced Memory and Neural Computing
Interconnection Networks and Systems
Machine Learning in Materials Science
Ferroelectric and Negative Capacitance Devices
CCD and CMOS Imaging Sensors
Numerical Methods and Algorithms
Robotics and Sensor-Based Localization
Scientific Computing and Data Management
Advanced Image and Video Retrieval Techniques
Neural Networks and Applications
Data Management and Algorithms
Distributed and Parallel Computing Systems
Advanced Data Storage Technologies
Advanced Malware Detection Techniques
Software Engineering Research
Gene Regulatory Network Analysis
Data Mining Algorithms and Applications
Visual Attention and Saliency Detection
Semiconductor Lasers and Optical Devices
Radiation Effects in Electronics
Data Quality and Management
Domain Adaptation and Few-Shot Learning

Pacific Northwest National Laboratory
2021-2025

Northeastern University
2019-2025

Universidad del Noreste
2022-2024

Weatherford College
2021

Flint Institute Of Arts
2021

Summarizing CPU and GPU Design Trends with Product Data

OPENALEX - Publications

Yifan Sun Nicolas Bohm Agostini Shi Dong David Kaeli

Moore's Law and Dennard Scaling have guided the semiconductor industry for past few decades. Recently, both laws faced validity challenges as transistor sizes approach practical limits of physics. We are interested in testing these reflect on reasons responsible. In this work, we collect data more than 4000 publicly-available CPU GPU products. find that scaling remains critical keeping valid. However, architectural solutions become increasingly important will play a larger role future....

10.48550/arxiv.1911.11313 preprint EN other-oa arXiv (Cornell University) 2019-01-01

An MLIR-based Compiler Flow for System-Level Design and Hardware Acceleration

OPENALEX - Publications

Nicolas Bohm Agostini Serena Curzel Vinay Amatya Cheng Tan Marco Minutoli and 4 more

The generation of custom hardware accelerators for applications implemented within high-level productive programming frameworks requires considerable manual effort. To automate this process, we introduce SODA-OPT, a compiler tool that extends the MLIR infrastructure. SODA-OPT automatically searches, outlines, tiles, and pre-optimizes relevant code regions to generate high-quality through synthesis. can support any framework domain-specific language interface with By leveraging MLIR, solves...

10.1145/3508352.3549424 article EN Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design 2022-10-30

Bridging Python to Silicon: The SODA Toolchain

OPENALEX - Publications

Nicolas Bohm Agostini Serena Curzel Jeff Zhang Ankur Limaye Cheng Tan and 7 more

Systems performing scientific computing, data analysis, and machine learning tasks have a growing demand for application-specific accelerators that can provide high computational performance while meeting strict size power requirements. However, the algorithms applications need to be accelerated are evolving at rate is incompatible with manual design processes based on hardware description languages. Agile tools compiler techniques help by quickly producing an integrated circuit (ASIC)...

10.1109/mm.2022.3178580 article EN IEEE Micro 2022-06-01

ML-CGRA: An Integrated Compilation Framework to Enable Efficient Machine Learning Acceleration on CGRAs

OPENALEX - Publications

Yixuan Luo Cheng Tan Nicolas Bohm Agostini Ang Li Antonino Tumeo and 2 more

Coarse-Grained Reconfigurable Arrays (CGRAs) can achieve higher energy-efficiency than general-purpose processors and accelerators or fine-grained reconfigurable devices, while maintaining adaptability to different computational patterns. CGRAs have shown some success as a platform accelerate machine learning (ML) thanks their flexibility, which allows them support new models not considered by fixed accelerators. However, current solutions for employ low level instruction-based compiler...

10.1109/dac56929.2023.10247873 article EN 2023-07-09

ChemComp: A Compilation Framework for Computing with Chemical Reaction Networks

OPENALEX - Publications

Nicolas Bohm Agostini Connah Johnson William R. Cannon Antonino Tumeo

10.1145/3658617.3703315 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2025-01-20

A Synthesis Methodology for Intelligent Memory Interfaces in Accelerator Systems

OPENALEX - Publications

Ankur Limaye Nicolas Bohm Agostini Claudio Barone Vito Giovanni Castellana Michele Fiorito and 3 more

10.1145/3658617.3697553 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2025-01-20

ChemComp: Compiling and Computing with Chemical Reaction Networks

OPENALEX - Publications

Nicolas Bohm Agostini Connah Johnson William R. Cannon Antonino Tumeo

10.23919/date64628.2025.10993207 article EN 2025-03-31

Online Learning for Dynamic Structural Characterization in Electron Energy Loss Spectroscopy

OPENALEX - Publications

M. Lakshmi Varshika Jonathan Hollenbach Nicolas Bohm Agostini Ankur Limaye Antonino Tumeo and 1 more

10.23919/date64628.2025.10992865 article EN 2025-03-31

OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays

OPENALEX - Publications

Cheng Tan Nicolas Bohm Agostini Jeff Zhang Marco Minutoli Vito Giovanni Castellana and 5 more

Reconfigurable architectures are today experiencing a renewed interest for their ability to provide specialization without sacrificing the capability adapt disparate workloads. Coarse-grained reconfigurable arrays (CGRAs) higher flexibility than application-specific integrated circuits (ASICs) while offering increased hardware efficiency with respect field-programmable gate (FPGAs). This makes CGRAs promising alternative enable power-/area-efficient acceleration across different application...

10.1109/asap52443.2021.00029 article EN 2021-07-01

Exploiting Adaptive Data Compression to Improve Performance and Energy-Efficiency of Compute Workloads in Multi-GPU Systems

OPENALEX - Publications

Mohammad Khavari Tavana Yifan Sun Nicolas Bohm Agostini David Kaeli

Graphics Processing Unit (GPU) performance has relied heavily on our ability to scale of number transistors chip, in order satisfy the ever-increasing demands for more computation. However, transistor scaling become extremely challenging, limiting that can be crammed onto a single die. Manufacturing large, fast and energy-efficient monolithic GPUs, while growing stream processing units on-chip, is no longer viable solution performance. GPU vendors are aiming exploit multi-GPU solutions,...

10.1109/ipdps.2019.00075 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2019-05-01

Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC

OPENALEX - Publications

Nicolas Bohm Agostini Shi Dong Elmira Karimi Marti Torrents Lapuerta José Cano and 2 more

Recently there has been a rapidly growing demand for faster machine learning (ML) processing in data centers and migration of ML inference applications to edge devices. These developments have prompted both industry academia explore custom accelerators optimize executions performance power. However, identifying which accelerator is best equipped performing particular task challenging, especially given the range tasks, number target environments, limited integrated modeling tools. To tackle...

10.1109/sbac-pad49847.2020.00013 article EN 2020-09-01

Automated Generation of Integrated Digital and Spiking Neuromorphic Machine Learning Accelerators

OPENALEX - Publications

Serena Curzel Nicolas Bohm Agostini Shihao Song Ismet Dagli Ankur Limaye and 8 more

The growing numbers of application areas for artificial intelligence (AI) methods have led to an explosion in availability domain-specific accelerators, which struggle support every new machine learning (ML) algorithm advancement, clearly highlighting the need a tool quickly and automatically transition from definition hardware implementation explore design space along variety SWaP (size, weight Power) metrics. software defined architectures (SODA) synthesizer implements modular...

10.1109/iccad51958.2021.9643474 article EN 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2021-11-01

SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference

OPENALEX - Publications

Jude Haris Perry Gibson José Cano Nicolas Bohm Agostini David Kaeli

Edge computing devices inherently face tight resource constraints, which is especially apparent when deploying Deep Neural Networks (DNN) with high memory and compute demands. FPGAs are commonly available in edge devices. Since these reconfigurable circuits can achieve higher throughput lower power consumption than general purpose processors, they well-suited for DNN acceleration. However, existing solutions designing FPGA-based accelerators come development overheads, given the cost of...

10.1109/sbac-pad53543.2021.00015 preprint EN 2021-10-01

SODA-OPT an MLIR based flow for co-design and high-level synthesis

OPENALEX - Publications

Nicolas Bohm Agostini Serena Curzel David Kaeli Antonino Tumeo

Due to technology and power limitations, general-purpose processing units are experiencing progressively smaller performance gains. Computer architecture innovations essential keep steadily increasing. Thus domain-specific accelerators receiving renewed interest have shown benefit different scientific machine learning applications [1, 3]. High-Level-Synthesis (HLS) provides a way quickly generate hardware descriptions for starting from high-level applications. However, state-of-the-art tools...

10.1145/3528416.3530866 article EN 2022-05-05

DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications on CGRAs

OPENALEX - Publications

Cheng Tan Nicolas Bohm Agostini Tong Geng Chenhao Xie Jiajia Li and 3 more

Coarse-grained reconfigurable arrays (CGRAs) provide higher flexibility than application-specific integrated circuits (ASICs) and efficiency fine-grained devices such as Field Programmable Gate Arrays (FPGAs). However, CGRAs are generally designed to support offloading of a single kernel. While the CGRA design, based on communicating functional units, appears naturally suit data streaming applications composed multiple cooperating kernels, current approaches only statically partition...

10.1109/hpca53966.2022.00030 article EN 2022-04-01

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

OPENALEX - Publications

Nicolas Bohm Agostini Jude Haris Perry Gibson Malith Jayaweera Norm Rubin and 4 more

This paper addresses the need for automatic and efficient generation of host driver code arbitrary custom AXI-based accelerators targeting linear algebra algorithms, an important workload in various applications, including machine learning scientific computing. While existing tools have focused on automating accelerator prototyping, little attention has been paid to host-accelerator interaction. introduces AXI4MLIR, extension MLIR compiler framework designed facilitate automated code. With...

10.1109/cgo57630.2024.10444801 article EN 2024-02-28

Towards Automated Generation of Chiplet-Based Systems Invited Paper

OPENALEX - Publications

Ankur Limaye Claudio Barone Nicolas Bohm Agostini Marco Minutoli Joseph Manzano and 6 more

10.1109/asp-dac58780.2024.10473980 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2024-01-22

NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator

OPENALEX - Publications

Kaustubh Shivdikar Nicolas Bohm Agostini Malith Jayaweera Gilbert Jonatan José Luis Abellán and 3 more

10.1109/isca59077.2024.00073 article EN 2024-06-29

Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs

OPENALEX - Publications

Shi Dong Yifan Sun Nicolas Bohm Agostini Elmira Karimi Daniel Lowell and 4 more

Deep Neural Networks (DNNs) have emerged as an important class of machine learning algorithms, providing accurate solutions to a broad range applications. Sparsity in activation maps DNN training presents opportunity reduce computations. However, exploiting sparsity two major challenges: i) profiling during comes with significant overhead due computing the degree and data movement; ii) dynamic nature requires dense-to-sparse conversion training, leading overhead. In this article, we present...

10.1109/tpds.2021.3067825 article EN IEEE Transactions on Parallel and Distributed Systems 2021-03-22

VCSR: An Efficient GPU Memory-Aware Sparse Format

OPENALEX - Publications

Elmira Karimi Nicolas Bohm Agostini Shi Dong David Kaeli

The Sparse Matrix-Vector Multiplication (SpMV) kernel is used in a broad class of linear algebra computations. SpMV computations result performance bottleneck many high applications, so optimizing paramount. While implementing this on GPU can potentially boost significantly, current libraries either provide modest gains or are burdened with sparse format conversion overhead. In paper we introduce the Vertical Compressed Row (VCSR) format, novel memory-aware that out-performs previous...

10.1109/tpds.2022.3177291 article EN IEEE Transactions on Parallel and Distributed Systems 2022-06-03

SECDA-TFLite: A toolkit for efficient development of FPGA-based DNN accelerators for edge inference

OPENALEX - Publications

Jude Haris Perry Gibson José Cano Nicolas Bohm Agostini David Kaeli

In this paper we propose SECDA-TFLite, a new open source toolkit for developing DNN hardware accelerators integrated within the TFLite framework.The leverages principles of SECDA, hardware/software co-design methodology, to reduce design time optimized inference on edge devices with FPGAs.With initial setup costs associated integrating accelerator target framework, allowing developers focus design.SECDA-TFLite also includes modules cost-effective SystemC simulation, profiling, and AXI-based...

10.1016/j.jpdc.2022.11.005 article EN cc-by Journal of Parallel and Distributed Computing 2022-11-15

Towards Automatic and Agile AI/ML Accelerator Design with End-to-End Synthesis

OPENALEX - Publications

Jeff Zhang Nicolas Bohm Agostini Shihao Song Cheng Tan Ankur Limaye and 7 more

Domain-specific designs offer greater energy efficiency and performance gain than general-purpose processors. For this reason, modern system-on-chips have a significant portion of their silicon area with custom accelerators. However, designing hardware by hand is laborious time-consuming, given the large design space performance, power, constraints that are not realized in software. Moreover, domain-specific algorithms (e.g., machine learning models) evolving quickly, challenging accelerator...

10.1109/asap52443.2021.00040 article EN 2021-07-01

End-to-End Synthesis of Dynamically Controlled Machine Learning Accelerators

OPENALEX - Publications

Serena Curzel Nicolas Bohm Agostini Vito Giovanni Castellana Marco Minutoli Ankur Limaye and 6 more

Edge systems are required to autonomously make real-time decisions based on large quantities of input data under strict power, performance, area, and other constraints. Meeting these constraints is only possible by specializing through hardware accelerators purposefully built for machine learning analysis algorithms. However, science evolves at a quick pace, manual design custom has high non-recurrent engineering costs: general solutions needed automatically rapidly transition from the...

10.1109/tc.2022.3211430 article EN cc-by IEEE Transactions on Computers 2022-01-01

Towards On-Chip Learning for Low Latency Reasoning with End-to-End Synthesis

OPENALEX - Publications

Vito Giovanni Castellana Nicolas Bohm Agostini Ankur Limaye Vinay Amatya Marco Minutoli and 5 more

The Software Defined Architectures (SODA) Synthesizer is an open-source compiler-based tool able to automatically generate domain-specialized systems targeting Application-Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) starting from high-level programming. SODA composed of a frontend, SODA-OPT, which leverages the multilevel intermediate representation (MLIR) framework interface with productive programming tools (e.g., machine learning frameworks), identify...

10.1145/3566097.3568360 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2023-01-16

Analyzing Inference Workloads for Spatiotemporal Modeling

OPENALEX - Publications

Milan Jain Nicolas Bohm Agostini Sayan Ghosh Antonino Tumeo

Download This Paper Open PDF in Browser Add to My Library Share: Permalink Using these links will ensure access this page indefinitely Copy URL DOI

10.2139/ssrn.4772671 preprint EN 2024-01-01

Coming Soon ...