François Fleuret

ORCID: 0000-0001-9457-7393
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Video Surveillance and Tracking Methods
  • Machine Learning and Data Classification
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Face and Expression Recognition
  • Neural Networks and Applications
  • Domain Adaptation and Few-Shot Learning
  • Advanced Vision and Imaging
  • Adversarial Robustness in Machine Learning
  • Image Retrieval and Classification Techniques
  • Anomaly Detection Techniques and Applications
  • Topic Modeling
  • Natural Language Processing Techniques
  • Cell Image Analysis Techniques
  • Explainable Artificial Intelligence (XAI)
  • Generative Adversarial Networks and Image Synthesis
  • Machine Learning and Algorithms
  • Robotics and Sensor-Based Localization
  • Multimodal Machine Learning Applications
  • 3D Shape Modeling and Analysis
  • Advanced Image Processing Techniques
  • Medical Image Segmentation Techniques
  • Data Management and Algorithms
  • Gaussian Processes and Bayesian Inference

University of Geneva
2019-2025

École Polytechnique Fédérale de Lausanne
2012-2023

Idiap Research Institute
2014-2023

Université Paris-Saclay
2017-2021

Université Paris-Sud
2016-2018

IAP Research (United States)
2014-2017

École Polytechnique
2014

École Normale Supérieure - PSL
2013

Institut national de recherche en informatique et en automatique
2000-2003

Multi-object tracking can be achieved by detecting objects in individual frames and then linking detections across frames. Such an approach made very robust to the occasional detection failure: If object is not detected a frame but previous following ones, correct trajectory will nevertheless produced. By contrast, false-positive few ignored. However, when dealing with multiple target problem, step results difficult optimization problem space of all possible families trajectories. This...

10.1109/tpami.2011.21 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2011-02-03

Given two to four synchronized video streams taken at eye level and from different angles, we show that can effectively combine a generative model with dynamic programming accurately follow up six individuals across thousands of frames in spite significant occlusions lighting changes. In addition, also derive metrically accurate trajectories for each one them. Our contribution is twofold. First, demonstrate our handle time frame independently, even when the only data available comes output...

10.1109/tpami.2007.1174 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2007-12-20

Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect the input's length, they are prohibitively slow for very long sequences. To address this limitation, we express self-attention as a linear dot-product of kernel feature maps and make use associativity property matrix products reduce complexity from $\mathcal{O}\left(N^2\right)$ $\mathcal{O}\left(N\right)$, where $N$ is sequence length. We show that formulation permits an iterative...

10.48550/arxiv.2006.16236 preprint EN other-oa arXiv (Cornell University) 2020-01-01

We present a unified framework for understanding human social behaviors in raw image sequences. Our model jointly detects multiple individuals, infers their actions, and estimates the collective actions with single feed-forward pass through neural network. propose architecture that does not rely on external detection algorithms but rather is trained end-to-end to generate dense proposal maps are refined via novel inference scheme. The temporal consistency handled person-level matching...

10.1109/cvpr.2017.365 article EN 2017-07-01

In this paper, we show that tracking multiple people whose paths may intersect can be formulated as a convex global optimization problem. Our proposed framework is designed to exploit image appearance cues prevent identity switches. method effective even when such are only available at distant time intervals. This unlike many current approaches depend on being exploitable from frame frame. We validate our approach three multi-camera sport and pedestrian datasets contain long complex...

10.1109/iccv.2011.6126235 article EN International Conference on Computer Vision 2011-11-01

People detection methods are highly sensitive to occlusions between pedestrians, which extremely frequent in many situations where cameras have be mounted at a limited height. The reduction of camera prices allows for the generalization static multi-camera set-ups. Using joint visual information from multiple synchronized gives opportunity improve performance. In this paper, we present new large-scale and high-resolution dataset. It has been captured with seven public open area, unscripted...

10.1109/cvpr.2018.00528 article EN 2018-06-01

We present GeoNeRF, a generalizable photorealistic novel view synthesis method based on neural radiance fields. Our approach consists of two main stages: ge-ometry reasoner and renderer. To render view, the geometry first constructs cascaded cost volumes for each nearby source view. Then, using Transformer- attention mechanism volumes, renderer infers appearance, ren-ders detailed images via classical volume rendering techniques. This architecture, in particular, allows sophis-ticated...

10.1109/cvpr52688.2022.01782 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

We present ESLAM, an efficient implicit neural representation method for Simultaneous Localization and Mapping (SLAM). ESLAM reads RGB-D frames with unknown camera poses in a sequential manner incrementally reconstructs the scene while estimating current position scene. incorporate latest advances Neural Radiance Fields (NeRF) into SLAM system, resulting accurate dense visual method. Our consists of multi-scale axis-aligned perpendicular feature planes shallow decoders that, each point...

10.1109/cvpr52729.2023.01670 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

10.1023/a:1011113216584 article EN International Journal of Computer Vision 2001-01-01

Given three or four synchronized videos taken at eye level and from different angles, we show that can effectively use dynamic programming to accurately follow up six individuals across thousands of frames in spite significant occlusions. In addition, also derive metrically accurate trajectories for each one them. Our main contribution is multi-person tracking be reliably achieved by processing individual separately over long sequences, provided a reasonable heuristic used rank these avoid...

10.1109/cvpr.2006.258 article EN 2006-07-10

In this paper, we show that tracking multiple people whose paths may intersect can be formulated as a multi-commodity network flow problem. Our proposed framework is designed to exploit image appearance cues prevent identity switches. method effective even when such are only available at distant time intervals. This unlike many current approaches depend on being exploitable from frame-to-frame. Furthermore, our algorithm lends itself real-time implementation. We validate approach three...

10.1109/tpami.2013.210 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2013-10-18

Automated scene interpretation has benefited from advances in machine learning, and restricted tasks, such as face detection, have been solved with sufficient accuracy for settings. However, the performance of machines providing rich semantic descriptions natural scenes digital images remains highly limited hugely inferior to that humans. Here we quantify this “semantic gap” a particular setting: We compare efficiency human learning assigning an image one two categories determined by spatial...

10.1073/pnas.1109168108 article EN Proceedings of the National Academy of Sciences 2011-10-17

In this paper, we show that tracking different kinds of interacting objects can be formulated as a network-flow mixed integer program. This is made possible by all simultaneously using intertwined flow variables and expressing the fact one object appear or disappear at locations where another in terms linear constraints. Our proposed method able to track invisible whose only evidence presence other contain them. Furthermore, our tracklet-based implementation yields real-time performance. We...

10.1109/tpami.2015.2513406 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2015-12-30

Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme focuses "informative" examples, reduces variance stochastic gradients during training. Our contribution is twofold: first, we derive tractable upper bound per-sample gradient norm, second an estimator reduction achieved sampling, which enables us switch it when will result in actual...

10.48550/arxiv.1803.00942 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Many state-of-the-art approaches to multi-object tracking rely on detecting them in each frame independently, grouping detections into short but reliable trajectory segments, and then further full trajectories. This typically relies imposing local smoothness constraints almost never enforcing more global ones the In this paper, we propose a non-Markovian approach consistency by using behavioral patterns guide algorithm. When used conjunction with algorithms, increases their already good...

10.1109/iccv.2017.278 article EN 2017-10-01

People detection in single 2D images has improved greatly recent years. However, comparatively little of this progress percolated into multi-camera multi-people tracking algorithms, whose performance still degrades severely when scenes become very crowded. In work, we introduce a new architecture that combines Convolutional Neural Nets and Conditional Random Fields to explicitly model those ambiguities. One its key ingredients are high-order CRF terms potential occlusions give our approach...

10.1109/iccv.2017.38 article EN 2017-10-01

Today, a frame-based camera is the sensor of choice for machine vision applications. However, these cameras, originally developed acquisition static images rather than sensing dynamic uncontrolled visual environments, suffer from high power consumption, data rate, latency and low range. An event-based image addresses drawbacks by mimicking biological retina. Instead measuring intensity every pixel in fixed time-interval, it reports events significant changes. Every such event represented its...

10.1109/iccv.2019.00161 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Multi-object tracking can be achieved by detecting objects in individual frames and then linking detections across frames. Such an approach made very robust to the occasional detection failure: if object is not detected a frame but previous following ones, correct trajectory will nevertheless produced. By contrast, false-positive few ignored. However, when dealing with multiple target problem, step results difficult optimization problem space of all possible families trajectories. This...

10.1109/pets-winter.2009.5399488 article EN 2009-12-01
Coming Soon ...