- Reinforcement Learning in Robotics
- Stochastic Gradient Optimization Techniques
- Advanced Graph Theory Research
- Limits and Structures in Graph Theory
- Domain Adaptation and Few-Shot Learning
- Face and Expression Recognition
- Advanced Neural Network Applications
- Neural Networks and Applications
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Machine Learning and Data Classification
- Machine Learning and Algorithms
- Sparse and Compressive Sensing Techniques
- Advanced Graph Neural Networks
- Advanced Multi-Objective Optimization Algorithms
- Complexity and Algorithms in Graphs
- Advanced Bandit Algorithms Research
- Topic Modeling
- Graph Labeling and Dimension Problems
- Model Reduction and Neural Networks
- Adversarial Robustness in Machine Learning
- Evolutionary Algorithms and Applications
- Privacy-Preserving Technologies in Data
- Human Pose and Action Recognition
- Mathematical Approximation and Integration
Columbia University
2012-2025
Google (United Kingdom)
2025
DeepMind (United Kingdom)
2025
Google (United States)
2015-2024
Université Paris Dauphine-PSL
2016
CEA LIST
2016
Commissariat à l'Énergie Atomique et aux Énergies Alternatives
2016
Courant Institute of Mathematical Sciences
2016
Applied Science Private University
2014
University of Warsaw
2007
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random features approach (FAVOR+), may be of independent interest for scalable kernel methods. FAVOR+ also...
As part of a complete software stack for autonomous driving, NVIDIA has created neural-network-based system, known as PilotNet, which outputs steering angles given images the road ahead. PilotNet is trained using paired with generated by human driving data-collection car. It derives necessary domain knowledge observing drivers. This eliminates need engineers to anticipate what important in an image and foresee all rules safe driving. Road tests demonstrated that can successfully perform lane...
Large pretrained (e.g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on. While these domains generic, may only barely overlap. For example, visual-language (VLMs) Internet-scale image captions, but large language (LMs) further text with no images spreadsheets, SAT questions, code). As a result, store different forms commonsense knowledge across domains. In this work, we show that diversity is symbiotic, and can be leveraged through...
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is a single model both learn map robot observations actions enjoy the benefits of large-scale pretraining language from web. To this end, we propose co-fine-tune state-of-the-art trajectory tasks, such as visual question answering. In contrast other approaches, simple, general recipe achieve...
We extend the classical Barabási-Albert preferential attachment procedure to graphs with internal vertex structure given by weights of vertices. In our model, weight dynamics depends on current degree distribution and takes into account both degrees prove that such a coupled leads scale-free exponents depending parameters dynamics.
This paper proposes a new method, that we call VisualBackProp, for visualizing which sets of pixels the input image contribute most to predictions made by convolutional neural network (CNN). The method heavily hinges on exploring intuition feature maps contain less and irrelevant information prediction decision when moving deeper into network. technique propose is dedicated CNN-based systems steering self-driving cars therefore required run in real-time. makes proposed visualization valuable...
Learning adaptable policies is crucial for robots to operate autonomously in our complex and quickly changing world. In this work, we present a new meta-learning method that allows adapt changes dynamics. contrast gradient-based algorithms rely on second-order gradient estimation, introduce more noise-tolerant Batch Hill-Climbing adaptation operator combine it with based evolutionary strategies. Our significantly improves dynamics high noise settings, which are common robotics applications....
We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS). Each database vector is quantized in multiple subspaces via set of codebooks, learned directly by minimizing the inner product error. Then, query to approximated as sum products with subspace quantizers. Different from recently proposed LSH approaches MIPS, vectors and queries do not need be augmented higher dimensional feature space. also provide theoretical analysis approach, consisting...
We introduce ES-MAML, a new framework for solving the model agnostic meta learning (MAML) problem based on Evolution Strategies (ES). Existing algorithms MAML are policy gradients, and incur significant difficulties when attempting to estimate second derivatives using backpropagation stochastic policies. show how ES can be applied obtain an algorithm which avoids of estimating derivatives, is also conceptually simple easy implement. Moreover, ES-MAML handle types nonsmooth adaptation...
Exploration is a key problem in reinforcement learning, since agents can only learn from data they acquire the environment. With that mind, maintaining population of an attractive method, as it allows be collected with diverse set behaviors. This behavioral diversity often boosted via multi-objective loss functions. However, those approaches typically leverage mean field updates based on pairwise distances, which makes them susceptible to cycling behaviors and increased redundancy. In...
Transformer models have achieved state-of-the-art results across a diverse range of domains. However, concern over the cost training attention mechanism to learn complex dependencies between distant inputs continues grow. In response, solutions that exploit structure and sparsity learned matrix blossomed. real-world applications involve long sequences, such as biological sequence analysis, may fall short meeting these assumptions, precluding exploration models. To address this challenge, we...
Learning from Label Proportions (LLP) is a learning setting, where the training data provided in groups, or "bags", and only proportion of each class bag known. The task to learn model predict labels individual instances. LLP has broad applications political science, marketing, healthcare, computer vision. This work answers fundamental question, when why possible, by introducing general framework, Empirical Proportion Risk Minimization (EPRM). EPRM learns an instance label classifier match...
This paper proposes a new method, that we call VisualBackProp, for visualizing which sets of pixels the input image contribute most to predictions made by convolutional neural network (CNN). The method heavily hinges on exploring intuition feature maps contain less and irrelevant information prediction decision when moving deeper into network. technique propose was developed as debugging tool CNN-based systems steering self-driving cars is therefore required run in real-time, i.e. it...
We present an intriguing discovery related to Random Fourier Features: in Gaussian kernel approximation, replacing the random matrix by a properly scaled orthogonal significantly decreases approximation error. call this technique Orthogonal Features (ORF), and provide theoretical empirical justification for behavior. Motivated discovery, we further propose Structured (SORF), which uses class of structured discrete matrices speed up computation. The method reduces time cost from...
Leveraging machine-learning (ML) techniques for compiler optimizations has been widely studied and explored in academia. However, the adoption of ML general-purpose, industry strength compilers yet to happen. We propose MLGO, a framework integrating systematically an industrial -- LLVM. As case study, we present details results replacing heuristics-based inlining-for-size optimization LLVM with machine learned models. To best our knowledge, this work is first full integration complex pass...