- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Vision and Imaging
- Human Pose and Action Recognition
- Image Retrieval and Classification Techniques
- 3D Shape Modeling and Analysis
- Video Surveillance and Tracking Methods
- Computer Graphics and Visualization Techniques
- COVID-19 diagnosis using AI
- Video Analysis and Summarization
- Anomaly Detection Techniques and Applications
- Visual Attention and Saliency Detection
- Generative Adversarial Networks and Image Synthesis
- 3D Surveying and Cultural Heritage
- Industrial Vision Systems and Defect Detection
- Robotics and Sensor-Based Localization
- Medical Image Segmentation Techniques
- Machine Learning and Data Classification
- Advanced Image Processing Techniques
- Remote Sensing and LiDAR Applications
- Image Processing Techniques and Applications
- Image Processing and 3D Reconstruction
- AI in cancer detection
Google (United States)
2017-2024
Synthesia (Czechia)
2024
Google (Switzerland)
2018-2022
Microsoft (United States)
2020-2021
University of Edinburgh
2011-2018
Institute of Science and Technology Austria
2017-2018
Microsoft Research (United Kingdom)
2017
Board of the Swiss Federal Institutes of Technology
2005-2012
ETH Zurich
2003-2012
University of Oxford
2007-2009
We present a generic objectness measure, quantifying how likely it is for an image window to contain object of any class. explicitly train distinguish objects with well-defined boundary in space, such as cows and telephones, from amorphous background elements, grass road. The measure combines Bayesian framework several cues measuring characteristics objects, appearing different their surroundings having closed boundary. These include innovative cue the characteristic. In experiments on...
Semantic classes can be either things (objects with a well-defined shape, e.g. car, person) or stuff (amorphous background regions, grass, sky). While lots of classification and detection works focus on thing classes, less attention has been given to classes. Nonetheless, are important as they allow explain aspects an image, including (1) scene type; (2) which likely present their location (through contextual reasoning); (3) physical attributes, material types geometric properties the scene....
We present a generic objectness measure, quantifying how likely it is for an image window to contain object of any class. explicitly train distinguish objects with well-defined boundary in space, such as cows and telephones, from amorphous background elements, grass road. The measure combines Bayesian framework several cues measuring characteristics objects, appearing different their surroundings having closed boundary. This includes innovative cue the characteristic. In experiments on...
The objective of this paper is to estimate 2D human pose as a spatial configuration body parts in TV and movie video shots. Such material uncontrolled extremely challenging. We propose an approach that progressively reduces the search space for parts, greatly improve chances estimation will succeed. This involves two contributions: (i) generic detector using weak model substantially reduce full space; (ii) employing 'grabcut' initialized on detected regions proposed by model, further prune...
We present a technique for separating foreground objects from the background in video. Our method is fast, fully automatic, and makes minimal assumptions about This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion appearance, non-rigid deformations articulations. In experiments on two datasets containing over 1400 video shots, our outperforms state-of-the-art subtraction [4] as well methods based clustering point tracks [6, 18,...
We present a family of scale-invariant local shape features formed by chains k connected roughly straight contour segments (kAS), and their use for object class detection. kAS are able to cleanly encode pure fragments an boundary without including nearby clutter. Moreover, they offer attractive compromise between information content repeatability encompass wide variety structures. also define translation scale invariant descriptor encoding the geometric configuration within kAS, making easy...
Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level that are then linked or tracked across time. In this paper, we leverage temporal continuity of videos instead operating level. We propose ACtion Tubelet detector (ACT-detector) takes as input a sequence frames and outputs tubelets, i.e., sequences bounding boxes with associated scores. The same way object detectors anchor boxes, our ACT-detector is based cuboids. build upon SSD...
Manually annotating object bounding boxes is central to building computer vision datasets, and it very time consuming (annotating ILSVRC [53] took 35s for one high-quality box [62]). It involves clicking on imaginary comers of a tight around the object. This difficult as these are often outside actual several adjustments required obtain box. We propose extreme instead: we ask annotator click four physical points object: top, bottom, left- right-most points. task more natural easy find....
We introduce a weakly supervised approach for learning human actions modeled as interactions between humans and objects. Our is human-centric: first localize in the image then determine object relevant action its spatial relation with human. The model learned automatically from set of still images annotated only label. relies on detector to initialize learning. For robustness various degrees visibility, we build that learns combine existing part detectors. Starting detected depicting action,...
Manually annotating object segmentation masks is very time consuming. Interactive methods offer a more efficient alternative where human annotator and machine model collaborate. In this paper we make several contributions to interactive segmentation: (1) systematically explore in simulation the design space of deep models report new insights caveats; (2) execute large-scale annotation campaign with real annotators, producing for 2.5M instances on OpenImages dataset. We released data...
The goal of this work is to perform 3D reconstruction and novel view synthesis from data captured by scanning platforms commonly deployed for world mapping in urban outdoor environments (e.g., Street View). Given a sequence posed RGB images lidar sweeps acquired cameras scanners moving through an scene, we produce model which surfaces can be extracted synthesized. Our approach extends Neural Radiance Fields, has been demonstrated synthesize realistic small scenes controlled settings, with...