- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Remote Sensing and LiDAR Applications
- Robotics and Sensor-Based Localization
- Video Surveillance and Tracking Methods
- COVID-19 diagnosis using AI
- Human Pose and Action Recognition
- Anomaly Detection Techniques and Applications
- Advanced Optical Sensing Technologies
- Animal Disease Management and Epidemiology
- Medical Image Segmentation Techniques
- Image Retrieval and Classification Techniques
- Handwritten Text Recognition Techniques
- Machine Learning and Algorithms
- Industrial Vision Systems and Defect Detection
- 3D Shape Modeling and Analysis
- Image Processing and 3D Reconstruction
- Underwater Acoustics Research
- AI in cancer detection
- Human Mobility and Location-Based Analysis
- Advanced Vision and Imaging
- Forecasting Techniques and Applications
- Image Enhancement Techniques
Valeo (France)
2019-2025
École nationale des ponts et chaussées
2015-2019
Google (United States)
2018
Laboratoire d'Informatique Gaspard-Monge
2015-2018
Université Gustave Eiffel
2015-2016
Over the last years, deep convolutional neural networks (ConvNets) have transformed field of computer vision thanks to their unparalleled capacity learn high level semantic image features. However, in order successfully those features, they usually require massive amounts manually labeled data, which is both expensive and impractical scale. Therefore, unsupervised feature learning, i.e., learning without requiring manual annotation effort, crucial importance harvest vast amount visual data...
The human visual system has the remarkably ability to be able effortlessly learn novel concepts from only a few examples. Mimicking same behavior on machine learning vision systems is an interesting and very challenging research problem with many practical advantages real world applications. In this context, goal of our work devise few-shot that during test time it will efficiently categories training data while at not forget initial which was trained (here called base categories). To...
We propose an object detection system that relies on a multi-region deep convolutional neural network (CNN) also encodes semantic segmentation-aware features. The resulting CNN-based representation aims at capturing diverse set of discriminative appearance factors and exhibits localization sensitivity is essential for accurate localization. exploit the above properties our recognition module by integrating it iterative mechanism alternates between scoring box proposal refining its location...
Few-shot learning and self-supervised address different facets of the same problem: how to train a model with little or no labeled data. aims for optimization methods models that can learn efficiently recognize patterns in low data regime. Self-supervised focuses instead on unlabeled looks into it supervisory signal feed high capacity deep neural networks. In this work we exploit complementarity these two domains propose an approach improving few-shot through self-supervision. We use...
Given an initial recognition model already trained on a set of base classes, the goal this work is to develop meta-model for few-shot learning. The meta-model, given as input some novel classes with few training examples per class, must properly adapt existing into new that can correctly classify in unified way both and classes. To accomplish it learn output appropriate classification weight vectors those two types build our we make use main innovations: propose Denoising Autoencoder network...
Casting semantic segmentation of outdoor LiDAR point clouds as a 2D problem, e.g., via range projection, is an effective and popular approach. These projection-based methods usually benefit from fast computations and, when combined with techniques which use other cloud representations, achieve state-of-the-art results. Today, leverage CNNs but recent advances in computer vision show that transformers (ViTs) have achieved results many image- based benchmarks. In this work, we question if...
Pixel wise image labeling is an interesting and challenging problem with great significance in the computer vision community. In order for a dense algorithm to be able achieve accurate precise results, it has consider dependencies that exist joint space of both input output variables. An implicit approach modeling those by training deep neural network that, given as initial estimate labels image, will predict new refined labels. this context, our work concerned what optimal architecture...
We propose a novel object localization methodology with the purpose of boosting accuracy stateof-the-art detection systems. Our model, given search region, aims at returning bounding box an interest inside this region. To accomplish its goal, it relies on assigning conditional probabilities to each row and column where these provide useful information regarding location boundaries region allow accurate inference under simple probabilistic framework. For implementing our we make use...
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, work we propose a self-supervised approach based on spatially dense descriptions that encode discrete visual concepts, here called words. To build such representations, quantize feature maps first pre-trained convnet, over k-means vocabulary. Then, as task, train another convnet predict histogram words an (i.e., its...
Segmenting or detecting objects in sparse Lidar point clouds are two important tasks autonomous driving to allow a vehicle act safely its 3D environment. The best performing methods semantic segmentation object detection rely on large amount of annotated data. Yet annotating data for these is tedious and costly. In this context, we propose self-supervised pretraining method perception models that tailored Specifically, leverage the availability synchronized calibrated image sensors setups...
Learning image representations without human supervision is an important and active research field. Several recent approaches have successfully leveraged the idea of making such a representation invariant under different types perturbations, especially via contrastive-based instance discrimination training. Although effective visual should indeed exhibit invariances, there are other characteristics, as encoding contextual reasoning skills, for which alternative reconstruction-based might be...
The problem of computing category agnostic bounding box proposals is utilized as a core component in many computer vision tasks and thus has lately attracted lot attention. In this work we propose new approach to tackle that based on an active strategy for generating starts from set seed boxes, which are uniformly distributed the image, then progressively moves its attention promising image areas where it more likely discover well localized proposals. We call our AttractioNet CNN-based...
Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach this problem, that leverages the activation features of vision transformer pre-trained self-supervised manner. Our method, LOST, does not require any external object proposal nor exploration collection; it operates on single image. Yet, we outperform state-of-the-art discovery methods by up 8 CorLoc points PASCAL VOC 2012. also show training...
Abstract Semantic image segmentation models typically require extensive pixel-wise annotations, which are costly to obtain and prone biases. Our work investigates learning semantic in urban scenes without any manual annotation. We propose a novel method for using raw, uncurated data from vehicle-mounted cameras LiDAR sensors, thus eliminating the need labeling. contributions as follows. First, we develop approach cross-modal unsupervised of by leveraging synchronized data. A crucial element...
Semantic future prediction is important for autonomous systems navigating dynamic environments. This paper introduces FUTURIST, a method multimodal semantic that uses unified and efficient visual sequence transformer architecture. Our approach incorporates masked modeling objective novel masking mechanism designed training. allows the model to effectively integrate visible information from various modalities, improving accuracy. Additionally, we propose VAE-free hierarchical tokenization...
Latent generative models have emerged as a leading approach for high-quality image synthesis. These rely on an autoencoder to compress images into latent space, followed by model learn the distribution. We identify that existing autoencoders lack equivariance semantic-preserving transformations like scaling and rotation, resulting in complex spaces hinder performance. To address this, we propose EQ-VAE, simple regularization enforces reducing its complexity without degrading reconstruction...
We describe an approach to predict open-vocabulary 3D semantic voxel occupancy map from input 2D images with the objective of enabling grounding, segmentation and retrieval free-form language queries. This is a challenging problem because 2D-3D ambiguity nature target tasks, where obtaining annotated training data in difficult. The contributions this work are three-fold. First, we design new model architecture for prediction. consists encoder together prediction 3D-language heads. output...