- Particle Detector Development and Performance
- Particle physics theoretical and experimental studies
- Advanced Neural Network Applications
- Parallel Computing and Optimization Techniques
- Radiation Detection and Scintillator Technologies
- Computational Physics and Python Applications
- Medical Imaging Techniques and Applications
- Neural Networks and Applications
- Adversarial Robustness in Machine Learning
- Model Reduction and Neural Networks
- Advanced Data Storage Technologies
- Cold Atom Physics and Bose-Einstein Condensates
- Machine Learning and Data Classification
- CCD and CMOS Imaging Sensors
- Numerical Methods and Algorithms
- Topic Modeling
- Distributed and Parallel Computing Systems
- Physics of Superconductivity and Magnetism
- Particle accelerators and beam dynamics
- Radiation Effects in Electronics
- Advanced Database Systems and Queries
- Atomic and Subatomic Physics Research
- Superconducting Materials and Applications
- Laser-Plasma Interactions and Diagnostics
- Scientific Computing and Data Management
European Organization for Nuclear Research
2019-2025
Massachusetts Institute of Technology
2023-2025
University of Michigan
2025
Rensselaer Polytechnic Institute
2025
Brookhaven National Laboratory
2025
Oak Ridge National Laboratory
2025
Central China Normal University
2025
Los Alamos National Laboratory
2025
New Jersey Institute of Technology
2025
Georgia Institute of Technology
2025
Abstract Compact symbolic expressions have been shown to be more efficient than neural network (NN) models in terms of resource consumption and inference speed when implemented on custom hardware such as field-programmable gate arrays (FPGAs), while maintaining comparable accuracy (Tsoi et al 2024 EPJ Web Conf. 295 09036). These capabilities are highly valuable environments with stringent computational constraints, high-energy physics experiments at the CERN Large Hadron Collider. However,...
We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate inference latency of $5\,\mu$s using architectures, targeting microsecond applications like those at CERN Large Hadron Collider. Considering benchmark models trained Street View House Numbers Dataset, various methods model compression in order to fit computational constraints a typical FPGA device used trigger and...
We present the implementation of binary and ternary neural networks in hls4ml library, designed to automatically convert deep network models digital circuits with FPGA firmware. Starting from benchmark trained floating point precision, we investigate different strategies reduce network's resource consumption by reducing numerical precision parameters or ternary. discuss trade-off between model accuracy consumption. In addition, show how balance latency retaining full on a selected subset...
Graph neural networks have been shown to achieve excellent performance for several crucial tasks in particle physics, such as charged tracking, jet tagging, and clustering. An important domain the application of these is FGPA-based first layer real-time data filtering at CERN Large Hadron Collider, which has strict latency resource constraints. We discuss how design distance-weighted graph that can be executed with a less than 1$\mu\mathrm{s}$ on an FPGA. To do so, we consider representative...
Despite advances in the programmable logic capabilities of modern trigger systems, a significant bottleneck remains amount data to be transported from detector off-detector where decisions are made. We demonstrate that neural network autoencoder model can implemented radiation tolerant ASIC perform lossy compression alleviating transmission problem while preserving critical information energy profile. For our application, we consider high-granularity calorimeter CMS experiment at CERN Large...
The high-energy physics community is investigating the potential of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs) to enhance sensitivity while still meeting data processing time constraints. In this contribution, we introduce a novel end-to-end procedure that utilizes machine learning technique called symbolic regression (SR). It searches equation space discover algebraic relations approximating dataset. We use PySR (a software uncover these expressions...
We develop and study FPGA implementations of algorithms for charged particle tracking based on graph neural networks. The two complementary designs are OpenCL, a framework writing programs that execute across heterogeneous platforms, hls4ml, high-level-synthesis-based compiler network to firmware conversion. evaluate compare the resource usage, latency, performance our benchmark dataset. find considerable speedup over CPU-based execution is possible, potentially enabling such be used...
This R&D project, initiated by the DOE Nuclear Physics AI-Machine Learning initiative in 2022, leverages AI to address data processing challenges high-energy nuclear experiments (RHIC, LHC, and future EIC). Our focus is on developing a demonstrator for real-time of high-rate streams from sPHENIX experiment tracking detectors. The limitations 15 kHz maximum trigger rate imposed calorimeters can be negated intelligent use streaming technology system. approach efficiently identifies low...
This R\&D project, initiated by the DOE Nuclear Physics AI-Machine Learning initiative in 2022, leverages AI to address data processing challenges high-energy nuclear experiments (RHIC, LHC, and future EIC). Our focus is on developing a demonstrator for real-time of high-rate streams from sPHENIX experiment tracking detectors. The limitations 15 kHz maximum trigger rate imposed calorimeters can be negated intelligent use streaming technology system. approach efficiently identifies low...
In high-energy physics, the increasing luminosity and detector granularity at Large Hadron Collider are driving need for more efficient data processing solutions. Machine Learning has emerged as a promising tool reconstructing charged particle tracks, due to its potentially linear computational scaling with hits. The recent implementation of graph neural network-based track reconstruction pipeline in first level trigger LHCb experiment on GPUs serves platform comparative studies between...
As machine learning (ML) increasingly serves as a tool for addressing real-time challenges in scientific applications, the development of advanced tooling has significantly reduced time required to iterate on various designs. These advancements have solved major obstacles, but also exposed new challenges. For example, processes that were not previously considered bottlenecks, such model synthesis, are now becoming limiting factors rapid iteration To reduce these emerging constraints,...
Abstract This study presents an efficient implementation of transformer architectures in Field-Programmable Gate Arrays (FPGAs) using hls4ml . We demonstrate the strategy for implementing multi head attention, softmax, and normalization layer evaluate three distinct models. Their deployment on VU13P FPGA chip achieved latency less than 2 μs, demonstrating potential real-time applications. 's compatibility with any TensorFlow-built model further enhances scalability applicability this work.
This paper presents novel reconfigurable architectures for reducing the latency of recurrent neural networks (RNNs) that are used detecting gravitational waves. Gravitational interferometers such as LIGO detectors capture cosmic events black hole mergers which happen at unknown times and varying durations, producing time-series data. We have developed a new architecture capable accelerating RNN inference analyzing data from detectors. is based on optimizing initiation intervals (II) in...
Abstract In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, demonstrate a fully-on-chip deployment with latency 4.9 ms per image, using less than 30% available resources on Xilinx ZCU102 evaluation board. The is reduced to 3 image when increasing batch size ten, corresponding use case...
We describe the implementation of Boosted Decision Trees in hls4ml library, which allows translation a trained model into FPGA firmware through an automated conversion process. Thanks to its fully on-chip implementation, performs inference Tree models with extremely low latency. With typical latency less than 100 ns, this solution is suitable for FPGA-based real-time processing, such as Level-1 Trigger system collider experiment. These developments open up prospects physicists deploy BDTs...
Abstract Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus widely adopted. Their use low-latency environments has, however, limited as a result of the difficulties implementing recurrent on field-programmable gate arrays (FPGAs). In this paper we present an implementation two types network layers—long short-term memory gated unit—within hls4ml framework. We demonstrate that our is capable producing designs both small large...