Vittorio Ferrari

ORCID: 0000-0002-1942-233X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Advanced Vision and Imaging
  • Human Pose and Action Recognition
  • Image Retrieval and Classification Techniques
  • 3D Shape Modeling and Analysis
  • Video Surveillance and Tracking Methods
  • Computer Graphics and Visualization Techniques
  • COVID-19 diagnosis using AI
  • Video Analysis and Summarization
  • Anomaly Detection Techniques and Applications
  • Visual Attention and Saliency Detection
  • Generative Adversarial Networks and Image Synthesis
  • 3D Surveying and Cultural Heritage
  • Industrial Vision Systems and Defect Detection
  • Robotics and Sensor-Based Localization
  • Medical Image Segmentation Techniques
  • Machine Learning and Data Classification
  • Advanced Image Processing Techniques
  • Remote Sensing and LiDAR Applications
  • Image Processing Techniques and Applications
  • Image Processing and 3D Reconstruction
  • AI in cancer detection

Google (United States)
2017-2024

Synthesia (Czechia)
2024

Google (Switzerland)
2018-2022

Microsoft (United States)
2020-2021

University of Edinburgh
2011-2018

Institute of Science and Technology Austria
2017-2018

Microsoft Research (United Kingdom)
2017

Board of the Swiss Federal Institutes of Technology
2005-2012

ETH Zurich
2003-2012

University of Oxford
2007-2009

We present a generic objectness measure, quantifying how likely it is for an image window to contain object of any class. explicitly train distinguish objects with well-defined boundary in space, such as cows and telephones, from amorphous background elements, grass road. The measure combines Bayesian framework several cues measuring characteristics objects, appearing different their surroundings having closed boundary. These include innovative cue the characteristic. In experiments on...

10.1109/tpami.2012.28 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2012-01-17

Semantic classes can be either things (objects with a well-defined shape, e.g. car, person) or stuff (amorphous background regions, grass, sky). While lots of classification and detection works focus on thing classes, less attention has been given to classes. Nonetheless, are important as they allow explain aspects an image, including (1) scene type; (2) which likely present their location (through contextual reasoning); (3) physical attributes, material types geometric properties the scene....

10.1109/cvpr.2018.00132 article EN 2018-06-01

We present a generic objectness measure, quantifying how likely it is for an image window to contain object of any class. explicitly train distinguish objects with well-defined boundary in space, such as cows and telephones, from amorphous background elements, grass road. The measure combines Bayesian framework several cues measuring characteristics objects, appearing different their surroundings having closed boundary. This includes innovative cue the characteristic. In experiments on...

10.1109/cvpr.2010.5540226 article EN 2010-06-01

The objective of this paper is to estimate 2D human pose as a spatial configuration body parts in TV and movie video shots. Such material uncontrolled extremely challenging. We propose an approach that progressively reduces the search space for parts, greatly improve chances estimation will succeed. This involves two contributions: (i) generic detector using weak model substantially reduce full space; (ii) employing 'grabcut' initialized on detected regions proposed by model, further prune...

10.1109/cvpr.2008.4587468 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2008-06-01

We present a technique for separating foreground objects from the background in video. Our method is fast, fully automatic, and makes minimal assumptions about This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion appearance, non-rigid deformations articulations. In experiments on two datasets containing over 1400 video shots, our outperforms state-of-the-art subtraction [4] as well methods based clustering point tracks [6, 18,...

10.1109/iccv.2013.223 article EN 2013-12-01

We present a family of scale-invariant local shape features formed by chains k connected roughly straight contour segments (kAS), and their use for object class detection. kAS are able to cleanly encode pure fragments an boundary without including nearby clutter. Moreover, they offer attractive compromise between information content repeatability encompass wide variety structures. also define translation scale invariant descriptor encoding the geometric configuration within kAS, making easy...

10.1109/tpami.2007.1144 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2007-11-21

Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level that are then linked or tracked across time. In this paper, we leverage temporal continuity of videos instead operating level. We propose ACtion Tubelet detector (ACT-detector) takes as input a sequence frames and outputs tubelets, i.e., sequences bounding boxes with associated scores. The same way object detectors anchor boxes, our ACT-detector is based cuboids. build upon SSD...

10.1109/iccv.2017.472 preprint EN 2017-10-01

10.1007/s11263-012-0538-3 article EN International Journal of Computer Vision 2012-05-29

10.1007/s11263-009-0270-9 article EN International Journal of Computer Vision 2009-07-16

Manually annotating object bounding boxes is central to building computer vision datasets, and it very time consuming (annotating ILSVRC [53] took 35s for one high-quality box [62]). It involves clicking on imaginary comers of a tight around the object. This difficult as these are often outside actual several adjustments required obtain box. We propose extreme instead: we ask annotator click four physical points object: top, bottom, left- right-most points. task more natural easy find....

10.1109/iccv.2017.528 article EN 2017-10-01

We introduce a weakly supervised approach for learning human actions modeled as interactions between humans and objects. Our is human-centric: first localize in the image then determine object relevant action its spatial relation with human. The model learned automatically from set of still images annotated only label. relies on detector to initialize learning. For robustness various degrees visibility, we build that learns combine existing part detectors. Starting detected depicting action,...

10.1109/tpami.2011.158 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2011-08-15

Manually annotating object segmentation masks is very time consuming. Interactive methods offer a more efficient alternative where human annotator and machine model collaborate. In this paper we make several contributions to interactive segmentation: (1) systematically explore in simulation the design space of deep models report new insights caveats; (2) execute large-scale annotation campaign with real annotators, producing for 2.5M instances on OpenImages dataset. We released data...

10.1109/cvpr.2019.01197 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

The goal of this work is to perform 3D reconstruction and novel view synthesis from data captured by scanning platforms commonly deployed for world mapping in urban outdoor environments (e.g., Street View). Given a sequence posed RGB images lidar sweeps acquired cameras scanners moving through an scene, we produce model which surfaces can be extracted synthesized. Our approach extends Neural Radiance Fields, has been demonstrated synthesize realistic small scenes controlled settings, with...

10.1109/cvpr52688.2022.01259 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01
Coming Soon ...