NFDI4DS | UHH-SEMS - Publication Details

Continual learning with hypernetworks

OPENALEX - Publications

Johannes von Oswald Christian Henning Benjamin F. Grewe João Sacramento

Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based task-conditioned hypernetworks, i.e., that generate the weights of target model task identity. Continual learning (CL) is less difficult for class models thanks to simple key feature: instead recalling input-output relations all previously seen data, hypernetworks only require rehearsing task-specific weight...

10.48550/arxiv.1906.00695 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Dendritic cortical microcircuits approximate the backpropagation algorithm

OPENALEX - Publications

João Sacramento Rui Ponte Costa Yoshua Bengio Walter Senn

Deep learning has seen remarkable developments over the last years, many of them inspired by neuroscience. However, main mechanism behind these advances - error backpropagation appears to be at odds with neurobiology. Here, we introduce a multilayer neuronal network model simplified dendritic compartments in which error-driven synaptic plasticity adapts towards global desired output. In contrast previous work our does not require separate phases and is driven local prediction errors...

10.48550/arxiv.1810.11393 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Continual learning with hypernetworks

OPENALEX - Publications

Johannes von Oswald Christian Henning João Sacramento Benjamin F. Grewe

Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based task-conditioned hypernetworks, i.e., that generate the weights of target model task identity. Continual learning (CL) is less difficult for class models thanks to simple key feature: instead recalling input-output relations all previously seen data, hypernetworks only require rehearsing task-specific weight...

10.5167/uzh-200390 article EN arXiv (Cornell University) 2020-04-30

Transformers learn in-context by gradient descent

OPENALEX - Publications

Johannes von Oswald Eyvind Niklasson Ettore Randazzo João Sacramento Alexander Mordvintsev and 2 more

At present, the mechanisms of in-context learning in Transformers are not well understood and remain mostly an intuition. In this paper, we suggest that training on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We start by providing a simple weight construction shows equivalence data transformations induced 1) single linear self-attention layer 2) gradient-descent (GD) regression loss. Motivated construction, show empirically when...

10.48550/arxiv.2212.07677 preprint EN other-oa arXiv (Cornell University) 2022-01-01

A neuronal least-action principle for real-time learning in cortical circuits

OPENALEX - Publications

Walter Senn Dominik Dold Ákos F. Kungl Benjamin Ellenberger Jakob Jordan and 3 more

A bstract One of the most fundamental laws physics is principle least action. Motivated by its predictive power, we introduce a neuronal least-action for cortical processing sensory streams to produce appropriate behavioural outputs in real time. The postulates that voltage dynamics pyramidal neurons prospectively minimizes local somato-dendritic mismatch error within individual neurons. For output neurons, implies minimizing an instantaneous error. deep network it prospective firing...

10.1101/2023.03.25.534198 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-03-25

Dendritic error backpropagation in deep cortical microcircuits

OPENALEX - Publications

João Sacramento Rui Ponte Costa Yoshua Bengio Walter Senn

Animal behaviour depends on learning to associate sensory stimuli with the desired motor command. Understanding how brain orchestrates necessary synaptic modifications across different areas has remained a longstanding puzzle. Here, we introduce multi-area neuronal network model in which plasticity continuously adapts towards global output. In this is driven by local dendritic prediction error that arises from failure predict top-down input given bottom-up activities. Such errors occur at...

10.48550/arxiv.1801.00062 preprint EN other-oa arXiv (Cornell University) 2018-01-01

A Theoretical Framework for Target Propagation

OPENALEX - Publications

Alexander Meulemans Francesco S. Carzaniga Johan A. K. Suykens João Sacramento Benjamin F. Grewe

The success of deep learning, a brain-inspired form AI, has sparked interest in understanding how the brain could similarly learn across multiple layers neurons. However, majority biologically-plausible learning algorithms have not yet reached performance backpropagation (BP), nor are they built on strong theoretical foundations. Here, we analyze target propagation (TP), popular but fully understood alternative to BP, from standpoint mathematical optimization. Our theory shows that TP is...

10.48550/arxiv.2006.14331 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Posterior Meta-Replay for Continual Learning

OPENALEX - Publications

Christian Henning Maria R. Cervera Francesco D’Angelo Johannes von Oswald Regina Traber and 4 more

Learning a sequence of tasks without access to i.i.d. observations is widely studied form continual learning (CL) that remains challenging. In principle, Bayesian directly applies this setting, since recursive and one-off updates yield the same result. practice, however, updating often leads poor trade-off solutions across because approximate inference necessary for most models interest. Here, we describe an alternative approach where task-conditioned parameter distributions are continually...

10.48550/arxiv.2103.01133 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Sensory representation of an auditory cued tactile stimulus in the posterior parietal cortex of the mouse

OPENALEX - Publications

Hemanth Mohan Yasir Gallero-Salas Stefano Carta João Sacramento Balazs Laurenczy and 4 more

Abstract Sensory association cortices receive diverse inputs with their role in representing and integrating multi-sensory content remaining unclear. Here we examined the neuronal correlates of an auditory-tactile stimulus sequence posterior parietal cortex (PPC) using 2-photon calcium imaging awake mice. We find that subpopulations layer 2/3 PPC reliably represent texture-touch events, addition to auditory cues presage incoming tactile stimulus. Notably, altering flow sensory events through...

10.1038/s41598-018-25891-x article EN cc-by Scientific Reports 2018-05-11

Energy Efficient Sparse Connectivity from Imbalanced Synaptic Plasticity Rules

OPENALEX - Publications

João Sacramento Andreas Wichert Mark C. W. van Rossum

It is believed that energy efficiency an important constraint in brain evolution. As synaptic transmission dominates consumption, can be saved by ensuring only a few synapses are active. therefore likely the formation of sparse codes and connectivity fundamental objectives plasticity. In this work we study how result from learning rule excitatory synapses. Information maximised when potentiation depression balanced according to mean presynaptic activity level resulting fraction zero-weight...

10.1371/journal.pcbi.1004265 article EN cc-by PLoS Computational Biology 2015-06-05

Learning where to learn: Gradient sparsity in meta and continual learning

OPENALEX - Publications

Johannes von Oswald Dominic Zhao Seijin Kobayashi Simon Schug M. Caccia and 2 more

Finding neural network weights that generalize well from small datasets is difficult. A promising approach to learn a weight initialization such number of changes results in low generalization error. We show this form meta-learning can be improved by letting the learning algorithm decide which change, i.e., where learn. find patterned sparsity emerges process, with pattern varying on problem-by-problem basis. This selective better and less interference range few-shot continual problems....

10.48550/arxiv.2110.14402 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Approximating the Predictive Distribution via Adversarially-Trained Hypernetworks

OPENALEX - Publications

Christian Henning Johannes von Oswald João Sacramento Simone Carlo Surace Jean-Pascal Pfister and 1 more

Being able to model uncertainty is a vital property for any intelligent agent. In an environment in which the domain of input stimuli fully controlled neglecting may work, but this usually does not hold true real-world scenario. This highlights necessity learning algorithms that robustly detect noisy and out-of-distribution examples. Here we propose novel approach estimation based on adversarially trained hypernetworks. We define weight posterior uniformly allow realizations neural network...

10.5167/uzh-168578 article EN 2018-12-07

Feedforward Initialization for Fast Inference of Deep Generative Networks is biologically plausible

OPENALEX - Publications

Yoshua Bengio Benjamin Scellier Olexa Bilaniuk João Sacramento Walter Senn

We consider deep multi-layered generative models such as Boltzmann machines or Hopfield nets in which computation (which implements inference) is both recurrent and stochastic, but where the recurrence not to model sequential structure, only perform computation. find conditions under a simple feedforward very good initialization for inference, after input units are clamped observed values. It means that initialization, network close fixed point of dynamics, energy gradient 0. The main...

10.48550/arxiv.1606.01651 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Conductance-based dendrites perform Bayes-optimal cue integration

OPENALEX - Publications

Jakob Jordan João Sacramento Willem A. M. Wybo Mihai A. Petrovici Walter Senn

A fundamental function of cortical circuits is the integration information from different sources to form a reliable basis for behavior. While animals behave as if they optimally integrate according Bayesian probability theory, implementation required computations in biological substrate remains unclear. We propose novel, view on dynamics conductance-based neurons and synapses which suggests that are naturally equipped perform integration. In our approach apical dendrites represent prior...

10.1371/journal.pcbi.1012047 article EN cc-by PLoS Computational Biology 2024-06-12

Uncovering mesa-optimization algorithms in Transformers

OPENALEX - Publications

Johannes von Oswald Eyvind Niklasson Maximilian Schlegel Seijin Kobayashi Nicolas Zucchet and 7 more

Transformers have become the dominant model in deep learning, but reason for their superior performance is poorly understood. Here, we hypothesize that strong of stems from an architectural bias towards mesa-optimization, a learned process running within forward pass consisting following two steps: (i) construction internal learning objective, and (ii) its corresponding solution found through optimization. To test this hypothesis, reverse-engineer series autoregressive trained on simple...

10.48550/arxiv.2309.05858 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Meta-Learning via Hypernetworks

OPENALEX - Publications

Dominic Zhao Johannes von Oswald Seijin Kobayashi João Sacramento Benjamin F. Grewe

Recent developments in few-shot learning have shown that during fast adaption, gradient-based meta-learners mostly rely on embedding features of powerful pretrained networks. This leads us to research ways effectively adapt and utilize the meta-learner's full potential. Here, we demonstrate effectiveness hypernetworks this context. We propose a soft row-sharing hypernetwork architecture show training with variant MAML is tightly linked meta-learning curvature matrix used condition gradients...

10.3929/ethz-b-000465883 article EN 2020-12-11

Beyond Backpropagation: Bilevel Optimization Through Implicit Differentiation and Equilibrium Propagation

OPENALEX - Publications

Nicolas Zucchet João Sacramento

This review examines gradient-based techniques to solve bilevel optimization problems. Bilevel extends the loss minimization framework underlying statistical learning systems that are implicitly defined through a quantity they minimize. characterization can be applied neural networks, optimizers, algorithmic solvers, and even physical allows for greater modeling flexibility compared usual explicit definition of such systems. We focus on solving problems this kind gradient descent, leveraging...

10.1162/neco_a_01547 article EN Neural Computation 2022-10-25

The least-control principle for local learning at equilibrium

OPENALEX - Publications

Alexander Meulemans Nicolas Zucchet Seijin Kobayashi Johannes von Oswald João Sacramento

Equilibrium systems are a powerful way to express neural computations. As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep networks, equilibrium recurrent models, or meta-learning. Here, we present new principle for learning with temporally- spatially-local rule. Our casts least-control problem, where first introduce an optimal controller lead the system towards solution state, then define reducing amount control needed...

10.48550/arxiv.2207.01332 preprint EN cc-by-sa arXiv (Cornell University) 2022-01-01

Taxonomical Associative Memory

OPENALEX - Publications

Diogo Rendeiro João Sacramento Andreas Wichert

10.1007/s12559-012-9198-4 article EN Cognitive Computation 2012-12-04

A Theoretical Framework for Target Propagation

OPENALEX - Publications

Alexander Meulemans Francesco S. Carzaniga Johan A. K. Suykens João Sacramento Benjamin F. Grewe

The success of deep learning, a brain-inspired form AI, has sparked interest in understanding how the brain could similarly learn across multiple layers neurons. However, majority biologically-plausible learning algorithms have not yet reached performance backpropagation (BP), nor are they built on strong theoretical foundations. Here, we analyze target propagation (TP), popular but fully understood alternative to BP, from standpoint mathematical optimization. Our theory shows that TP is...

10.5167/uzh-198834 article EN Neural Information Processing Systems 2020-12-12