- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Autonomous Vehicle Technology and Safety
- Adversarial Robustness in Machine Learning
- Robotics and Sensor-Based Localization
- Robotic Path Planning Algorithms
- COVID-19 diagnosis using AI
- Robot Manipulation and Learning
- Artificial Intelligence in Games
- Geophysical Methods and Applications
- Advanced Vision and Imaging
- Video Surveillance and Tracking Methods
- Stochastic Gradient Optimization Techniques
- Seismic Imaging and Inversion Techniques
- Image Retrieval and Classification Techniques
- Reinforcement Learning in Robotics
- Marriage and Sexual Relationships
- Multimodal Machine Learning Applications
- Forensic Anthropology and Bioarchaeology Studies
- Human Pose and Action Recognition
- Face recognition and analysis
- African Sexualities and LGBTQ+ Issues
- Automated Road and Building Extraction
- Image and Object Detection Techniques
Skolkovo Institute of Science and Technology
2024
Level-5 (Japan)
2023
Université Paris Sciences et Lettres
2019-2020
École Normale Supérieure - PSL
2019-2020
Centre National de la Recherche Scientifique
2019-2020
Menlo School
2020
Meta (United States)
2020
Institut national de recherche en informatique et en automatique
2020
École nationale des ponts et chaussées
2015-2018
Laboratoire d'Informatique Gaspard-Monge
2015-2018
Deep residual networks were shown to be able scale up thousands of layers and still have improving performance. However, each fraction a percent improved accuracy costs nearly doubling the number layers, so training very deep has problem diminishing feature reuse, which makes these slow train. To tackle problems, in this paper we conduct detailed experimental study on architecture ResNet blocks, based propose novel where decrease depth increase width networks. We call resulting network...
Deep residual networks were shown to be able scale up thousands of layers and still have improving performance. However, each fraction a percent improved accuracy costs nearly doubling the number layers, so training very deep has problem diminishing feature reuse, which makes these slow train. To tackle problems, in this paper we conduct detailed experimental study on architecture ResNet blocks, based propose novel where decrease depth increase width networks. We call resulting network...
Attention plays a critical role in human visual experience. Furthermore, it has recently been demonstrated that attention can also play an important the context of applying artificial neural networks to variety tasks from fields such as computer vision and NLP. In this work we show that, by properly defining for convolutional networks, actually use type information order significantly improve performance student CNN network forcing mimic maps powerful teacher network. To end, propose several...
In this paper we show how to learn directly from image data (i.e., without resorting manually-designed features) a general similarity function for comparing patches, which is task of fundamental importance many computer vision problems. To encode such function, opt CNN-based model that trained account wide variety changes in appearance. end, explore and study multiple neural network architectures, are specifically adapted task. We an approach can significantly outperform the state-of-the-art...
The recent COCO object detection dataset presents several new challenges for detection. In particular, it contains objects at a broad range of scales, less prototypical images, and requires more precise localization. To address these challenges, we test three modifications to the standard Fast R-CNN detector: (1) skip connections that give detector access features multiple network layers, (2) foveal structure exploit context resolutions, (3) an integral loss function corresponding adjustment...
We use the scattering network as a generic and fixed initialization of first layers supervised hybrid deep network. show that early do not necessarily need to be learned, providing best results to-date with pre-defined representations while being competitive Deep CNNs. Using shallow cascade 1 × convolutions, which encodes coefficients correspond spatial windows very small sizes, permits obtain AlexNet accuracy on imagenet ILSVRC2012. demonstrate this local encoding explicitly learns...
This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past future reasoning for tracked objects. Thus, we name it "Past- and-Future Tracking" (PF-Track). Specifically, our method adopts the "tracking by attention" framework represents instances coherently over time with object queries. To explicitly use historical cues, "Past Reasoning" module learns to refine tracks enhance features...
Scattering networks are a class of designed Convolutional Neural Networks (CNNs) with fixed weights. We argue they can serve as generic representations for modelling images. In particular, by working in scattering space, we achieve competitive results both supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs. For learning, demonstrate that the early layers CNNs do not necessarily need to be learned, replaced network instead. Indeed,...
Deep neural networks with skip-connections, such as ResNet, show excellent performance in various image classification benchmarks. It is though observed that the initial motivation behind them - training deeper does not actually hold true, and benefits come from increased capacity, rather than depth. Motivated by this, inspired we propose a simple Dirac weight parameterization, which allows us to train very deep plain without explicit achieve nearly same performance. This parameterization...
We present a new method that views object detection as direct set prediction problem. Our approach streamlines the pipeline, effectively removing need for many hand-designed components like non-maximum suppression procedure or anchor generation explicitly encode our prior knowledge about task. The main ingredients of framework, called DEtection TRansformer DETR, are set-based global loss forces unique predictions via bipartite matching, and transformer encoder-decoder architecture. Given...
We address the problem of visually guided rearrangement planning with many movable objects, i.e., finding a sequence actions to move set objects from an initial arrangement desired one, while relying on visual inputs coming RGB camera. To do so, we introduce complete pipeline two key contributions. First, efficient and scalable method, based Monte-Carlo Tree Search exploration strategy. demonstrate that because its good trade-off between exploitation our method (i) scales well number (ii)...
The goal of autonomous vehicles is to navigate public roads safely and comfortably. To enforce safety, traditional planning approaches rely on handcrafted rules generate trajectories. Machine learning-based systems, the other hand, scale with data are able learn more complex behaviors. However, they often ignore that agents self-driving vehicle trajectory distributions can be leveraged improve safety. In this paper, we propose modeling a distribution over multiple future trajectories for...
Abstract. In this paper we evaluated deep-learning frameworks based on Convolutional Neural Networks for the accurate classification of multispectral remote sensing data. Certain state-of-the-art models have been tested publicly available SAT-4 and SAT-6 high resolution satellite datasets. particular, performed benchmark included AlexNet, AlexNet-small VGG which had trained applied to both datasets exploiting all spectral information. Deep Belief Networks, Autoencoders other semi-supervised...
We present a new shape prior formalism for the segmentation of rectified facade images. It combines simplicity split grammars with unprecedented expressive power: capability encoding simultaneous alignment in two dimensions, occlusions and irregular boundaries between elements. formulate task finding most likely image conforming to proposed form as MAP-MRF problem over 4-connected pixel grid, propose an efficient optimization algorithm solving it. Our method simultaneously segments visible...
Since DeepMind’s AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no connected layer). Using such an architecture plus global pooling, we create bots independent of size. The training made more robust by keeping track best checkpoints during and against them. these features, release Polygames, our framework learning, with its library games checkpoints. We won strong humans at game Hex in 19 ×...
The recent COCO object detection dataset presents several new challenges for detection. In particular, it contains objects at a broad range of scales, less prototypical images, and requires more precise localization. To address these challenges, we test three modifications to the standard Fast R-CNN detector: (1) skip connections that give detector access features multiple network layers, (2) foveal structure exploit context resolutions, (3) an integral loss function corresponding adjustment...