Iacopo Poli

ORCID: 0000-0002-0964-0624
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Neural Networks and Reservoir Computing
  • Optical Network Technologies
  • Advanced Memory and Neural Computing
  • Neural Networks and Applications
  • Photonic and Optical Devices
  • Adversarial Robustness in Machine Learning
  • Machine Learning in Materials Science
  • Topic Modeling
  • Natural Language Processing Techniques
  • Machine Learning and Data Classification
  • Human Pose and Action Recognition
  • Ferroelectric and Negative Capacitance Devices
  • Random lasers and scattering media
  • Advanced Optical Sensing Technologies
  • Advanced Statistical Process Monitoring
  • Sparse and Compressive Sensing Techniques
  • Gene Regulatory Network Analysis
  • Statistical Methods and Inference
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Metabolomics and Mass Spectrometry Studies
  • Stochastic Gradient Optimization Techniques
  • Chaos control and synchronization
  • Anomaly Detection Techniques and Applications
  • Physical Unclonable Functions (PUFs) and Hardware Security

Los Alamos National Laboratory
1990

We consider the problem of detecting abrupt changes in distribution a multi-dimensional time series, with limited computing power and memory. In this paper, we propose new, simple method for model-free online change-point detection that relies only on fast light recursive statistics, inspired by classical Exponential Weighted Moving Average algorithm (EWMA). The proposed idea is to compute two EWMA statistics stream data different forgetting factors, compare them. By doing so, show...

10.1109/tsp.2020.2990597 article EN IEEE Transactions on Signal Processing 2020-01-01

In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1.2 billion parameters, trained on over 280 million sequences belonging the UniRef-100 database. Such hold promise greatly accelerating design. We conduct first systematic study how capabilities evolve model size transformers in domain: evaluate RITA next amino acid prediction, zero-shot fitness, and enzyme function showing benefits from increased scale. release openly, benefit...

10.48550/arxiv.2205.05789 preprint EN other-oa arXiv (Cornell University) 2022-01-01

The backpropagation algorithm has long been the canonical training method for neural networks. Modern paradigms are implicitly optimized it, and numerous guidelines exist to ensure its proper use. Recently, synthetic gradients methods -where error gradient is only roughly approximated - have garnered interest. These not better portray how biological brains learning, but also open new computational possibilities, such as updating layers asynchronously. Even so, they failed scale past simple...

10.48550/arxiv.1906.04554 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Encoder-only transformer models such as BERT offer a great performance-size tradeoff for retrieval and classification tasks with respect to larger decoder-only models. Despite being the workhorse of numerous production pipelines, there have been limited Pareto improvements since its release. In this paper, we introduce ModernBERT, bringing modern model optimizations encoder-only representing major improvement over older encoders. Trained on 2 trillion tokens native 8192 sequence length,...

10.48550/arxiv.2412.13663 preprint EN arXiv (Cornell University) 2024-12-18

Despite being the workhorse of deep learning, backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization training process. Furthermore, its biological plausibility challenged. Alternative schemes have been devised; yet, under constraint synaptic asymmetry, none scaled to modern learning tasks and architectures. Here, we challenge this perspective, study applicability Direct Feedback Alignment neural view synthesis, recommender...

10.48550/arxiv.2006.12878 preprint EN other-oa arXiv (Cornell University) 2020-01-01

We propose a new defense mechanism against adversarial attacks inspired by an optical co-processor, providing robustness without compromising natural accuracy in both white-box and black-box settings. This hardware co-processor performs nonlinear fixed random transformation, where the parameters are unknown impossible to retrieve with sufficient precision for large enough dimensions. In setting, our works obfuscating of projection. Unlike other defenses relying on obfuscated gradients, we...

10.1109/icassp43922.2022.9746671 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

We consider the problem of detecting abrupt changes in distribution a multi-dimensional time series, with limited computing power and memory. In this paper, we propose new, simple method for model-free online change-point detection that relies only on fast light recursive statistics, inspired by classical Exponential Weighted Moving Average algorithm (EWMA). The proposed idea is to compute two EWMA statistics stream data different forgetting factors, compare them. By doing so, show...

10.48550/arxiv.1805.08061 preprint EN other-oa arXiv (Cornell University) 2018-01-01

As neural networks grow larger and more complex data-hungry, training costs are skyrocketing. Especially when lifelong learning is necessary, such as in recommender systems or self-driving cars, this might soon become unsustainable. In study, we present the first optical co-processor able to accelerate phase of digitally-implemented networks. We rely on direct feedback alignment an alternative backpropagation, perform error projection step optically. Leveraging random projections delivered...

10.48550/arxiv.2006.01475 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Modern machine learning relies nearly exclusively on dedicated electronic hardware accelerators. Photonic approaches, with low consumption and high operation speed, are increasingly considered for inference but, to date, remain mostly limited relatively basic tasks. Simultaneously, the problem of training deep complex neural networks, overwhelmingly performed through backpropagation, remains a significant limitation size and, consequently, performance current architectures major compute...

10.48550/arxiv.2409.12965 preprint EN arXiv (Cornell University) 2024-09-01

The scaling hypothesis motivates the expansion of models past trillions parameters as a path towards better performance. Recent significant developments, such GPT-3, have been driven by this conjecture. However, scale-up, training them efficiently with backpropagation becomes difficult. Because model, pipeline, and data parallelism distribute gradients over compute nodes, communication is challenging to orchestrate: bottleneck further scaling. In work, we argue that alternative methods can...

10.48550/arxiv.2012.06373 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Exact computation of linear algebra operations is challenging or even impossible at extreme scale By leveraging randomization we can get approximate results reduced computational cost Lighton OPU: The first commercially available photonic Co-Processor

10.1109/hcs52781.2021.9566948 article EN 2021-08-22

Access to large pre-trained models of varied architectures, in many different languages, is central the democratization NLP. We introduce PAGnol, a collection French GPT models. Using scaling laws, we efficiently train PAGnol-XL (1.5B parameters) with same computational budget as CamemBERT, model 13 times smaller. largest trained date for language. plan increasingly and performing versions exploring capabilities extreme-scale For this first release, focus on pre-training calculations...

10.48550/arxiv.2110.08554 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Optical Processing Units (OPUs) -- low-power photonic chips dedicated to large scale random projections have been used in previous work train deep neural networks using Direct Feedback Alignment (DFA), an effective alternative backpropagation. Here, we demonstrate how leverage the intrinsic noise of optical build a differentially private DFA mechanism, making OPUs solution choice provide private-by-design training. We theoretical analysis our adaptive privacy carefully measuring propagates...

10.48550/arxiv.2106.03645 preprint EN other-oa arXiv (Cornell University) 2021-01-01

The performance of algorithms for neural architecture search strongly depends on the parametrization space. We use contrastive learning to identify networks across different initializations based their data Jacobians, and automatically produce first embeddings independent from Using our embeddings, we show that traditional black-box optimization algorithms, without modification, can reach state-of-the-art in Neural Architecture Search. As method provides a unified embedding space, perform...

10.48550/arxiv.2102.04208 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Robustness to adversarial attacks is typically obtained through expensive training with Projected Gradient Descent. Here we introduce ROPUST, a remarkably simple and efficient method leverage robust pre-trained models further increase their robustness, at no cost in natural accuracy. Our technique relies on the use of an Optical Processing Unit (OPU), photonic co-processor, fine-tuning step performed Direct Feedback Alignment, synthetic gradient scheme. We test our nine different against...

10.48550/arxiv.2108.04217 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Recent work has identified simple empirical scaling laws for language models, linking compute budget, dataset size, model and autoregressive modeling loss. The validity of these power across orders magnitude in scale provides compelling evidence that larger models are also more capable models. However, up under the constraints hardware infrastructure is no easy feat, rapidly becomes a hard expensive engineering problem. We investigate ways to tentatively cheat laws, train cheaper. emulate an...

10.48550/arxiv.2109.11928 preprint EN other-oa arXiv (Cornell University) 2021-01-01

We introduce LightOn's Optical Processing Unit (OPU), the first photonic AI accelerator chip available on market for at-scale Non von Neumann computations, reaching 1500 TeraOPS. It relies a combination of free-space optics with off-the-shelf components, together software API allowing seamless integration within Python-based processing pipelines. discuss variety use cases and hybrid network architectures, OPU used in CPU/GPU, draw pathway towards "optical advantage".

10.48550/arxiv.2107.11814 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...