Pascal Fua

ORCID: 0000-0002-6702-9970
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Human Pose and Action Recognition
  • Robotics and Sensor-Based Localization
  • 3D Shape Modeling and Analysis
  • Video Surveillance and Tracking Methods
  • Advanced Image and Video Retrieval Techniques
  • Computer Graphics and Visualization Techniques
  • Medical Image Segmentation Techniques
  • Advanced Neural Network Applications
  • Anomaly Detection Techniques and Applications
  • 3D Surveying and Cultural Heritage
  • Cell Image Analysis Techniques
  • Human Motion and Animation
  • Optical measurement and interference techniques
  • Image and Object Detection Techniques
  • Domain Adaptation and Few-Shot Learning
  • Advanced Numerical Analysis Techniques
  • Remote Sensing and LiDAR Applications
  • Image Retrieval and Classification Techniques
  • Video Analysis and Summarization
  • Face recognition and analysis
  • Image Processing Techniques and Applications
  • Hand Gesture Recognition Systems
  • Machine Learning and Algorithms
  • Advanced Electron Microscopy Techniques and Applications

École Polytechnique Fédérale de Lausanne
2016-2025

University of British Columbia
2020

Max Planck Institute for Informatics
2019

Max Planck Society
2019

Swiss Data Science Center
2018

University of Salzburg
2018

École Polytechnique
2013-2016

University of Bern
2015-2016

SRI International
1991-2013

Menlo School
1987-2013

Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art algorithms for their ability adhere image boundaries, speed, memory efficiency, impact segmentation performance. We then introduce new algorithm, simple linear iterative clustering (SLIC), which adapts k-means...

10.1109/tpami.2012.120 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2012-05-30

10.1007/s11263-008-0152-6 article EN International Journal of Computer Vision 2008-07-18

In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm dense depth and occlusion maps from wide-baseline pairs using descriptor. This yields much better results in situations than the pixel correlation-based algorithms that are commonly used narrow-baseline stereo. Also, descriptor makes our robust against many photometric geometric transformations. Our inspired earlier ones such as SIFT GLOH but can be...

10.1109/tpami.2009.77 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2009-04-17

Multi-object tracking can be achieved by detecting objects in individual frames and then linking detections across frames. Such an approach made very robust to the occasional detection failure: If object is not detected a frame but previous following ones, correct trajectory will nevertheless produced. By contrast, false-positive few ignored. However, when dealing with multiple target problem, step results difficult optimization problem space of all possible families trajectories. This...

10.1109/tpami.2011.21 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2011-02-03

We propose a CNN-based approach for 3D human body pose estimation from single RGB images that addresses the issue of limited generalizability models trained solely on starkly publicly available data. Using only existing data and 2D data, we show state-of-the-art performance established benchmarks through transfer learned features, while also generalizing to in-the-wild scenes. further introduce new training set monocular real humans has ground truth captured with multi-camera marker-less...

10.1109/3dv.2017.00064 article EN 2021 International Conference on 3D Vision (3DV) 2017-10-01

We propose a single-shot approach for simultaneously detecting an object in RGB image and predicting its 6D pose without requiring multiple stages or having to examine hypotheses. Unlike recently proposed technique this task [10] that only predicts approximate must then be refined, ours is accurate enough not require additional post-processing. As result, it much faster - 50 fps on Titan X (Pascal) GPU more suitable real-time processing. The key component of our method new CNN architecture...

10.1109/cvpr.2018.00038 article EN 2018-06-01

Deep learning has revolutionalized image-level tasks such as classification, but patch-level tasks, correspondence, still rely on hand-crafted features, e.g. SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. We deal the large number potential combination stochastic sampling training set an aggressive mining strategy biased towards patches that are...

10.1109/iccv.2015.22 preprint EN 2015-12-01

Binary descriptors are becoming increasingly popular as a means to compare feature points very fast while requiring comparatively small amounts of memory. The typical approach creating them is first compute floating-point ones, using an algorithm such SIFT, and then binarize them. In this paper, we show that can directly binary descriptor, which call BRIEF, on the basis simple intensity difference tests. As result, BRIEF both build match. We it against SURF SIFT standard benchmarks yields...

10.1109/tpami.2011.222 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2011-11-17

In this paper we want to start the discussion on whether image based 3-D modelling techniques can possibly be used replace LIDAR systems for outdoor 3D data acquisition. Two main issues have addressed in context: (i) camera calibration (internal and external) (ii) dense multi-view stereo. To investigate both, acquired test from scenes both with cameras. Using as reference estimated ground-truth several scenes. Evaluation sets are prepared evaluate different aspects of model building. These...

10.1109/cvpr.2008.4587706 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2008-06-01

Given two to four synchronized video streams taken at eye level and from different angles, we show that can effectively combine a generative model with dynamic programming accurately follow up six individuals across thousands of frames in spite significant occlusions lighting changes. In addition, also derive metrically accurate trajectories for each one them. Our contribution is twofold. First, demonstrate our handle time frame independently, even when the only data available comes output...

10.1109/tpami.2007.1174 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2007-12-20

In many 3D object-detection and pose-estimation problems, runtime performance is of critical importance. However, there usually time to train the system, which we would show be very useful. Assuming that several registered images target object are available, developed a keypoint-based approach effective in this context by formulating wide-baseline matching keypoints extracted from input those found model as classification problem. This shifts much computational burden training phase, without...

10.1109/tpami.2006.188 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2006-07-26

State-of-the-art methods for counting people in crowded scenes rely on deep networks to estimate crowd density. They typically use the same filters over whole image or large patches. Only then do they local scale compensate perspective distortion. This is achieved by training an auxiliary classifier select, predefined patches, best kernel size among a limited set of choices. As such, these are not end-to-end trainable and restricted scope context can leverage. In this paper, we introduce...

10.1109/cvpr.2019.00524 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

SIFT-like local feature descriptors are ubiquitously employed in computer vision applications such as content-based retrieval, video analysis, copy detection, object recognition, photo tourism, and 3D reconstruction. Feature can be designed to invariant certain classes of photometric geometric transformations, particular, affine intensity scale transformations. However, real transformations that an image undergo only approximately modeled this way, thus most practice. Second, usually high...

10.1109/tpami.2011.103 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2011-05-20

While feature point recognition is a key component of modern approaches to object detection, existing require computationally expensive patch preprocessing handle perspective distortion. In this paper, we show that formulating the problem in naive Bayesian classification framework makes such unnecessary and produces an algorithm simple, efficient, robust. Furthermore, it scales well as number classes grows. To recognize patches surrounding keypoints, our classifier uses hundreds simple...

10.1109/tpami.2009.23 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2009-01-28

We present a method for real-time 3D object instance detection that does not require time-consuming training stage, and can handle untextured objects. At its core, our approach is novel image representation template matching designed to be robust small transformations. This robustness based on spread gradient orientations allows us test only subset of all possible pixel locations when parsing the image, represent with limited set templates. In addition, we demonstrate if dense depth sensor...

10.1109/tpami.2011.206 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2011-10-13

We develop a deep architecture to learn find good correspondences for wide-baseline stereo. Given set of putative sparse matches and the camera intrinsics, we train our network in an end-to-end fashion label as inliers or outliers, while simultaneously using them recover relative pose, encoded by essential matrix. Our is based on multi-layer perceptron operating pixel coordinates rather than directly image, thus simple small. introduce novel normalization technique, called Context...

10.1109/cvpr.2018.00282 preprint EN 2018-06-01

We introduce a novel local image descriptor designed for dense wide-baseline matching purposes. feed our descriptors to graph-cuts based depth map estimation algorithm and this yields better performance than the commonly used correlation windows which size is hard tune. As result, unlike competing techniques that require many high-resolution images produce good reconstructions, can compute them from pairs of low-quality such as ones captured by video streams. Our inspired earlier SIFT GLOH...

10.1109/cvpr.2008.4587673 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2008-06-01

The performance of a classifier trained on data coming from specific domain typically degrades when applied to related but different one. While annotating many samples the new would address this issue, it is often too expensive or impractical. Domain Adaptation has therefore emerged as solution problem; It leverages annotated source domain, in which abundant, train operate target either sparse even lacking altogether. In context, recent trend consists learning deep architectures whose...

10.1109/tpami.2018.2814042 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-03-08

Many applications require tracking of complex 3D objects.These include visual servoing robotic arms on specific target objects, Augmented Reality systems that real-time registration the object to be augmented, and head sophisticated interfaces can use.Computer Vision offers solutions are cheap, practical non-invasive.This survey reviews different techniques approaches have been developed by industry research.First, important mathematical tools introduced: Camera representation, robust...

10.1561/0600000001 article EN Foundations and Trends® in Computer Graphics and Vision 2005-01-01

We advocate the use of Gaussian Process Dynamical Models (GPDMs) for learning human pose and motion priors 3D people tracking. A GPDM provides a lowdimensional embedding data, with density function that gives higher probability to poses motions close training data. With Bayesian model averaging can be learned from relatively small amounts it generalizes gracefully outside set. Here we modify permit significant stylistic variation. The resulting are effective tracking range walking styles,...

10.1109/cvpr.2006.15 article EN 2006-07-10

While feature point recognition is a key component of modern approaches to object detection, existing require computationally expensive patch preprocessing handle perspective distortion. In this paper, we show that formulating the problem in Naive Bayesian classification framework makes such unnecessary and produces an algorithm simple, efficient, robust. Furthermore, it scales well large number classes. To recognize patches surrounding keypoints, our classifier uses hundreds simple binary...

10.1109/cvpr.2007.383123 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2007-06-01
Coming Soon ...