- Particle physics theoretical and experimental studies
- Quantum Chromodynamics and Particle Interactions
- High-Energy Particle Collisions Research
- Neutrino Physics Research
- Particle Detector Development and Performance
- Computational Physics and Python Applications
- Black Holes and Theoretical Physics
- Dark Matter and Cosmic Phenomena
- Particle Accelerators and Free-Electron Lasers
- Medical Imaging Techniques and Applications
- Superconducting Materials and Applications
- Nuclear physics research studies
- Distributed and Parallel Computing Systems
- Atomic and Subatomic Physics Research
- Stochastic processes and statistical mechanics
- Adversarial Robustness in Machine Learning
- Advanced Data Storage Technologies
- International Science and Diplomacy
- Algorithms and Data Compression
- Advanced Neural Network Applications
- Parallel Computing and Optimization Techniques
- Markov Chains and Monte Carlo Methods
- Radiation Detection and Scintillator Technologies
- Robotics and Sensor-Based Localization
- Congenital limb and hand anomalies
Massachusetts Institute of Technology
2021-2025
University of Cincinnati
2023
TU Dortmund University
2019-2022
European Organization for Nuclear Research
2018-2021
The NSF AI Institute for Artificial Intelligence and Fundamental Interactions
2021
Otto-von-Guericke University Magdeburg
2018
Inflammation plays an important role in the pathogenesis of ischemic stroke including acute and prolonged inflammatory process. The neutrophil granulocytes as first driver immune reaction from blood site is under debate due to controversial findings. In bone marrow chimeric mice we were able study dynamics tdTomato-expressing neutrophils GFP-expressing microglia after photothrombosis using intravital two-photon microscopy. We demonstrate infiltration into brain parenchyma confirm a...
Abstract We pursue the use of deep learning methods to improve state-of-the-art computations in theoretical high-energy physics. Planar <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mrow> <mml:mi class="MJX-tex-calligraphic">N</mml:mi> </mml:mrow> <mml:mo>=</mml:mo> <mml:mn>4</mml:mn> </mml:math> Super Yang–Mills theory is a close cousin that describes Higgs boson production at Large Hadron Collider; its scattering amplitudes are large mathematical...
We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. present both microscopic analysis anchored by an effective theory and macroscopic of phase diagrams describing learning performance across hyperparameters. find that generalization originates from structured representations whose dynamics dependence on set size can be predicted our in toy setting. observe empirically the presence four phases: comprehension, memorization, confusion....
The LHCb experiment at CERN is undergoing an upgrade in preparation for the Run 3 data taking period of LHC. As part this trigger moving to a fully software implementation operating LHC bunch crossing rate. We present evaluation CPU-based and GPU-based first stage High Level Trigger. After detailed comparison both options are found be viable. This document summarizes performance details these options, outcome which has led choice as baseline.
Abstract The Lipschitz constant of the map between input and output space represented by a neural network is natural metric for assessing robustness model. We present new method to constrain dense deep learning models that can also be generalized other architectures. relies on simple weight normalization scheme during training ensures every layer below an upper limit specified analyst. A monotonic residual connection then used make model in any subset its inputs, which useful scenarios where...
Learning with Errors (LWE) is a hard math problem underlying recently standardized post-quantum cryptography (PQC) systems for key exchange and digital signatures. Prior work proposed new machine learning (ML)-based attacks on LWE problems small, sparse secrets, but these require millions of samples to train take days recover secrets. We propose three methods -- better preprocessing, angular embeddings model pre-training improve attacks, speeding up preprocessing by $25\times$ improving...
The monotonic dependence of the outputs a neural network on some its inputs is crucial inductive bias in many scenarios where domain knowledge dictates such behavior. This especially important for interpretability and fairness considerations. In broader context, which monotonicity can be found finance, medicine, physics, other disciplines. It thus desirable to build architectures that implement this provably. work, we propose weight-constrained architecture with single residual connection...
A novel neural architecture was recently developed that enforces an exact upper bound on the Lipschitz constant of model by constraining norm its weights in a minimal way, resulting higher expressiveness compared to other techniques. We present new and interesting direction for this architecture: estimation Wasserstein metric (Earth Mover's Distance) optimal transport employing Kantorovich-Rubinstein duality enable use geometric fitting applications. Specifically, we focus field high-energy...
The Lipschitz constant of the map between input and output space represented by a neural network is natural metric for assessing robustness model. We present new method to constrain dense deep learning models that can also be generalized other architectures. relies on simple weight normalization scheme during training ensures every layer below an upper limit specified analyst. A monotonic residual connection then used make model in any subset its inputs, which useful scenarios where domain...
The upgraded LHCb detector, due to start datataking in 2022, will have process an average data rate of 4 TB/s real time. Because LHCb’s physics objectives require that the full detector information for every LHC bunch crossing is read out and made available real-time processing, this bandwidth challenge equivalent ATLAS CMS HL-LHC software read-out, but deliverable five years earlier. Over past six years, collaboration has undertaken a bottom-up rewrite its infrastructure, pattern...
Sparse binary LWE secrets are under consideration for standardization Homomorphic Encryption and its applications to private computation. Known attacks on sparse include the dual attack hybrid dual-meet in middle which requires significant memory. In this paper, we provide a new statistical with low memory requirement. The relies some initial lattice reduction. key observation is that, after reduction applied rows of q-ary-like embedded random matrix $\mathbf A$, entries high variance...
The task of identifying B meson flavor at the primary interaction point in LHCb detector is crucial for measurements mixing and time-dependent CP violation. Flavour tagging usually done with a small number expert systems that find important tracks to infer flavour from. Recent advances show replacing all those one ML algorithm considers an event yields increase power. However, training current classifier takes long time not suitable use real-time triggers. In this work we present new...
The task of identifying B meson flavor at the primary interaction point in LHCb detector is crucial for measurements mixing and timedependent CP violation. Flavor tagging usually done with a small number expert systems that find important tracks to infer from. Recent advances show replacing all those one ML algorithm considers an event yields increase power. However, training current classifier takes long time it not suitable use real triggers. In this work we present new classifier, based...
The operating conditions defining the current data taking campaign at Large Hadron Collider, known as Run 3, present unparalleled challenges for real-time acquisition workflow of LHCb experiment CERN. To address anticipated surge in luminosity and consequent event rate, is transitioning to a fully software-based trigger system. This evolution necessitated innovations hardware configurations, software paradigms, algorithmic design. A significant advancement integration monotonic Lipschitz...
Memory Mosaics are networks of associative memories working in concert to achieve a prediction task interest. Like transformers, memory mosaics possess compositional capabilities and in-context learning capabilities. Unlike these comparatively transparent ways. We demonstrate on toy examples we also show that perform as well or better than transformers medium-scale language modeling tasks.
We pursue the use of deep learning methods to improve state-of-the-art computations in theoretical high-energy physics. Planar N = 4 Super Yang-Mills theory is a close cousin that describes Higgs boson production at Large Hadron Collider; its scattering amplitudes are large mathematical expressions containing integer coefficients. In this paper, we apply Transformers predict these The problem can be formulated language-like representation amenable standard cross-entropy training objectives....
Mechanistic Interpretability (MI) promises a path toward fully understanding how neural networks make their predictions. Prior work demonstrates that even when trained to perform simple arithmetic, models can implement variety of algorithms (sometimes concurrently) depending on initialization and hyperparameters. Does this mean neuron-level interpretability techniques have limited applicability? We argue high-dimensional learn low-dimensional representations training data are useful beyond...
Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where cannot recall when probed in a different order than was encountered training, exemplifies this retrieval. We reframe the curse as factorization - failure of learn same joint distribution under factorizations. Through series controlled experiments increasing levels realism including...
Large language models (LLMs) with long context windows have gained significant attention. However, the KV cache, stored to avoid re-computation, becomes a bottleneck. Various dynamic sparse or TopK-based attention approximation methods been proposed leverage common insight that is sparse. In this paper, we first show TopK itself suffers from quality degradation in certain downstream tasks because not always as expected. Rather than selecting keys and values highest scores, sampling...
Despite their remarkable success in language modeling, transformers trained to predict the next token a sequence struggle with long-term planning. This limitation is particularly evident tasks requiring foresight plan multiple steps ahead such as maze navigation. The standard single prediction objective, however, offers no explicit mechanism - or revisit path taken so far. Consequently, this work we study whether explicitly predicting (and backwards) can improve transformers' We train...
The physics programme of the LHCb experiment at Large Hadron Collider requires an efficient and precise reconstruction particle collision vertices. Upgrade detector relies on a fully software-based trigger with online rate 30 MHz, necessitating fast vertex finding algorithms. This paper describes new approach to developed for this purpose. algorithm is based cluster within histogram trajectory projections along beamline adaptive fit. Its implementations optimisations x86 GPU architectures...
Despite their remarkable success in language modeling, transformers trained to predict the next token a sequence struggle with long-term planning. This limitation is particularly evident tasks requiring foresight plan multiple steps ahead such as maze navigation. The standard _single_ prediction objective, however, offers no explicit mechanism ahead—or revisit path taken so far. Consequently, this work we study whether explicitly predicting (and backwards) can improve transformers’ We train...
Abstract During Run3 of the LHC LHCb detector will process a 30 MHz event rate with full readout followed by software trigger. To deal increased computational requirements, framework is reviewed and optimized on large scale. One challenge efficient scheduling O (10 3 )- 4 ) algorithms in High Level Trigger (HLT) application. This document describes design new algorithm scheduler which allows for static-order intra-event minimum complexity while still providing required flexibility.