N. S. Nolte

ORCID: 0000-0003-2536-4209
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Particle physics theoretical and experimental studies
  • Quantum Chromodynamics and Particle Interactions
  • High-Energy Particle Collisions Research
  • Neutrino Physics Research
  • Particle Detector Development and Performance
  • Computational Physics and Python Applications
  • Black Holes and Theoretical Physics
  • Dark Matter and Cosmic Phenomena
  • Particle Accelerators and Free-Electron Lasers
  • Medical Imaging Techniques and Applications
  • Superconducting Materials and Applications
  • Nuclear physics research studies
  • Distributed and Parallel Computing Systems
  • Atomic and Subatomic Physics Research
  • Stochastic processes and statistical mechanics
  • Adversarial Robustness in Machine Learning
  • Advanced Data Storage Technologies
  • International Science and Diplomacy
  • Algorithms and Data Compression
  • Advanced Neural Network Applications
  • Parallel Computing and Optimization Techniques
  • Markov Chains and Monte Carlo Methods
  • Radiation Detection and Scintillator Technologies
  • Robotics and Sensor-Based Localization
  • Congenital limb and hand anomalies

Massachusetts Institute of Technology
2021-2025

University of Cincinnati
2023

TU Dortmund University
2019-2022

European Organization for Nuclear Research
2018-2021

The NSF AI Institute for Artificial Intelligence and Fundamental Interactions
2021

Otto-von-Guericke University Magdeburg
2018

Inflammation plays an important role in the pathogenesis of ischemic stroke including acute and prolonged inflammatory process. The neutrophil granulocytes as first driver immune reaction from blood site is under debate due to controversial findings. In bone marrow chimeric mice we were able study dynamics tdTomato-expressing neutrophils GFP-expressing microglia after photothrombosis using intravital two-photon microscopy. We demonstrate infiltration into brain parenchyma confirm a...

10.1371/journal.pone.0193970 article EN cc-by PLoS ONE 2018-03-15

Abstract We pursue the use of deep learning methods to improve state-of-the-art computations in theoretical high-energy physics. Planar <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mrow> <mml:mi class="MJX-tex-calligraphic">N</mml:mi> </mml:mrow> <mml:mo>=</mml:mo> <mml:mn>4</mml:mn> </mml:math> Super Yang–Mills theory is a close cousin that describes Higgs boson production at Large Hadron Collider; its scattering amplitudes are large mathematical...

10.1088/2632-2153/ad743e article EN cc-by Machine Learning Science and Technology 2024-08-27

We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. present both microscopic analysis anchored by an effective theory and macroscopic of phase diagrams describing learning performance across hyperparameters. find that generalization originates from structured representations whose dynamics dependence on set size can be predicted our in toy setting. observe empirically the presence four phases: comprehension, memorization, confusion....

10.48550/arxiv.2205.10343 preprint EN cc-by arXiv (Cornell University) 2022-01-01

The LHCb experiment at CERN is undergoing an upgrade in preparation for the Run 3 data taking period of LHC. As part this trigger moving to a fully software implementation operating LHC bunch crossing rate. We present evaluation CPU-based and GPU-based first stage High Level Trigger. After detailed comparison both options are found be viable. This document summarizes performance details these options, outcome which has led choice as baseline.

10.1007/s41781-021-00070-2 article EN cc-by Computing and Software for Big Science 2021-12-22

Abstract The Lipschitz constant of the map between input and output space represented by a neural network is natural metric for assessing robustness model. We present new method to constrain dense deep learning models that can also be generalized other architectures. relies on simple weight normalization scheme during training ensures every layer below an upper limit specified analyst. A monotonic residual connection then used make model in any subset its inputs, which useful scenarios where...

10.1088/2632-2153/aced80 article EN cc-by Machine Learning Science and Technology 2023-08-04

Learning with Errors (LWE) is a hard math problem underlying recently standardized post-quantum cryptography (PQC) systems for key exchange and digital signatures. Prior work proposed new machine learning (ML)-based attacks on LWE problems small, sparse secrets, but these require millions of samples to train take days recover secrets. We propose three methods -- better preprocessing, angular embeddings model pre-training improve attacks, speeding up preprocessing by $25\times$ improving...

10.48550/arxiv.2402.01082 preprint EN arXiv (Cornell University) 2024-02-01

The monotonic dependence of the outputs a neural network on some its inputs is crucial inductive bias in many scenarios where domain knowledge dictates such behavior. This especially important for interpretability and fairness considerations. In broader context, which monotonicity can be found finance, medicine, physics, other disciplines. It thus desirable to build architectures that implement this provably. work, we propose weight-constrained architecture with single residual connection...

10.48550/arxiv.2307.07512 preprint EN cc-by arXiv (Cornell University) 2023-01-01

A novel neural architecture was recently developed that enforces an exact upper bound on the Lipschitz constant of model by constraining norm its weights in a minimal way, resulting higher expressiveness compared to other techniques. We present new and interesting direction for this architecture: estimation Wasserstein metric (Earth Mover's Distance) optimal transport employing Kantorovich-Rubinstein duality enable use geometric fitting applications. Specifically, we focus field high-energy...

10.48550/arxiv.2209.15624 preprint EN cc-by arXiv (Cornell University) 2022-01-01

The Lipschitz constant of the map between input and output space represented by a neural network is natural metric for assessing robustness model. We present new method to constrain dense deep learning models that can also be generalized other architectures. relies on simple weight normalization scheme during training ensures every layer below an upper limit specified analyst. A monotonic residual connection then used make model in any subset its inputs, which useful scenarios where domain...

10.48550/arxiv.2112.00038 preprint EN cc-by arXiv (Cornell University) 2021-01-01

The upgraded LHCb detector, due to start datataking in 2022, will have process an average data rate of 4 TB/s real time. Because LHCb’s physics objectives require that the full detector information for every LHC bunch crossing is read out and made available real-time processing, this bandwidth challenge equivalent ATLAS CMS HL-LHC software read-out, but deliverable five years earlier. Over past six years, collaboration has undertaken a bottom-up rewrite its infrastructure, pattern...

10.1051/epjconf/202125104009 article EN cc-by EPJ Web of Conferences 2021-01-01

Sparse binary LWE secrets are under consideration for standardization Homomorphic Encryption and its applications to private computation. Known attacks on sparse include the dual attack hybrid dual-meet in middle which requires significant memory. In this paper, we provide a new statistical with low memory requirement. The relies some initial lattice reduction. key observation is that, after reduction applied rows of q-ary-like embedded random matrix $\mathbf A$, entries high variance...

10.48550/arxiv.2403.10328 preprint EN arXiv (Cornell University) 2024-03-15

The task of identifying B meson flavor at the primary interaction point in LHCb detector is crucial for measurements mixing and time-dependent CP violation. Flavour tagging usually done with a small number expert systems that find important tracks to infer flavour from. Recent advances show replacing all those one ML algorithm considers an event yields increase power. However, training current classifier takes long time not suitable use real-time triggers. In this work we present new...

10.48550/arxiv.2404.14145 preprint EN arXiv (Cornell University) 2024-04-22

The task of identifying B meson flavor at the primary interaction point in LHCb detector is crucial for measurements mixing and timedependent CP violation. Flavor tagging usually done with a small number expert systems that find important tracks to infer from. Recent advances show replacing all those one ML algorithm considers an event yields increase power. However, training current classifier takes long time it not suitable use real triggers. In this work we present new classifier, based...

10.1051/epjconf/202429509018 article EN cc-by EPJ Web of Conferences 2024-01-01

The operating conditions defining the current data taking campaign at Large Hadron Collider, known as Run 3, present unparalleled challenges for real-time acquisition workflow of LHCb experiment CERN. To address anticipated surge in luminosity and consequent event rate, is transitioning to a fully software-based trigger system. This evolution necessitated innovations hardware configurations, software paradigms, algorithmic design. A significant advancement integration monotonic Lipschitz...

10.1051/epjconf/202429509005 article EN cc-by EPJ Web of Conferences 2024-01-01

Memory Mosaics are networks of associative memories working in concert to achieve a prediction task interest. Like transformers, memory mosaics possess compositional capabilities and in-context learning capabilities. Unlike these comparatively transparent ways. We demonstrate on toy examples we also show that perform as well or better than transformers medium-scale language modeling tasks.

10.48550/arxiv.2405.06394 preprint EN arXiv (Cornell University) 2024-05-10

We pursue the use of deep learning methods to improve state-of-the-art computations in theoretical high-energy physics. Planar N = 4 Super Yang-Mills theory is a close cousin that describes Higgs boson production at Large Hadron Collider; its scattering amplitudes are large mathematical expressions containing integer coefficients. In this paper, we apply Transformers predict these The problem can be formulated language-like representation amenable standard cross-entropy training objectives....

10.48550/arxiv.2405.06107 preprint EN other-oa OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information) 2024-05-09

Mechanistic Interpretability (MI) promises a path toward fully understanding how neural networks make their predictions. Prior work demonstrates that even when trained to perform simple arithmetic, models can implement variety of algorithms (sometimes concurrently) depending on initialization and hyperparameters. Does this mean neuron-level interpretability techniques have limited applicability? We argue high-dimensional learn low-dimensional representations training data are useful beyond...

10.48550/arxiv.2405.17425 preprint EN arXiv (Cornell University) 2024-05-27

Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where cannot recall when probed in a different order than was encountered training, exemplifies this retrieval. We reframe the curse as factorization - failure of learn same joint distribution under factorizations. Through series controlled experiments increasing levels realism including...

10.48550/arxiv.2406.05183 preprint EN arXiv (Cornell University) 2024-06-07

Large language models (LLMs) with long context windows have gained significant attention. However, the KV cache, stored to avoid re-computation, becomes a bottleneck. Various dynamic sparse or TopK-based attention approximation methods been proposed leverage common insight that is sparse. In this paper, we first show TopK itself suffers from quality degradation in certain downstream tasks because not always as expected. Rather than selecting keys and values highest scores, sampling...

10.48550/arxiv.2410.16179 preprint EN arXiv (Cornell University) 2024-10-21

Despite their remarkable success in language modeling, transformers trained to predict the next token a sequence struggle with long-term planning. This limitation is particularly evident tasks requiring foresight plan multiple steps ahead such as maze navigation. The standard single prediction objective, however, offers no explicit mechanism - or revisit path taken so far. Consequently, this work we study whether explicitly predicting (and backwards) can improve transformers' We train...

10.48550/arxiv.2412.05117 preprint EN arXiv (Cornell University) 2024-12-06

The physics programme of the LHCb experiment at Large Hadron Collider requires an efficient and precise reconstruction particle collision vertices. Upgrade detector relies on a fully software-based trigger with online rate 30 MHz, necessitating fast vertex finding algorithms. This paper describes new approach to developed for this purpose. algorithm is based cluster within histogram trajectory projections along beamline adaptive fit. Its implementations optimisations x86 GPU architectures...

10.48550/arxiv.2412.14966 preprint EN arXiv (Cornell University) 2024-12-19

Despite their remarkable success in language modeling, transformers trained to predict the next token a sequence struggle with long-term planning. This limitation is particularly evident tasks requiring foresight plan multiple steps ahead such as maze navigation. The standard _single_ prediction objective, however, offers no explicit mechanism ahead—or revisit path taken so far. Consequently, this work we study whether explicitly predicting (and backwards) can improve transformers’ We train...

10.32388/3q1xzw preprint EN cc-by 2024-12-27

Abstract During Run3 of the LHC LHCb detector will process a 30 MHz event rate with full readout followed by software trigger. To deal increased computational requirements, framework is reviewed and optimized on large scale. One challenge efficient scheduling O (10 3 )- 4 ) algorithms in High Level Trigger (HLT) application. This document describes design new algorithm scheduler which allows for static-order intra-event minimum complexity while still providing required flexibility.

10.1088/1742-6596/1525/1/012052 article EN Journal of Physics Conference Series 2020-04-01
Coming Soon ...