- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Image Retrieval and Classification Techniques
- Video Analysis and Summarization
- COVID-19 diagnosis using AI
- Human Pose and Action Recognition
- Visual Attention and Saliency Detection
- Advanced Vision and Imaging
- Medical Image Segmentation Techniques
- Topic Modeling
- Generative Adversarial Networks and Image Synthesis
- Video Surveillance and Tracking Methods
- Remote-Sensing Image Classification
- Robotics and Sensor-Based Localization
- AI in cancer detection
- Adversarial Robustness in Machine Learning
- Artificial Intelligence in Games
- Machine Learning and Data Classification
- Face Recognition and Perception
- Advanced Image Processing Techniques
- Aesthetic Perception and Analysis
- Reinforcement Learning in Robotics
- Gait Recognition and Analysis
Google (Switzerland)
2018-2022
Google (United States)
2017-2022
University of Edinburgh
2014-2018
École Polytechnique Fédérale de Lausanne
2017
University of Trento
2011-2014
Amsterdam University of the Arts
2011
University of Amsterdam
2006-2011
Semantic classes can be either things (objects with a well-defined shape, e.g. car, person) or stuff (amorphous background regions, grass, sky). While lots of classification and detection works focus on thing classes, less attention has been given to classes. Nonetheless, are important as they allow explain aspects an image, including (1) scene type; (2) which likely present their location (through contextual reasoning); (3) physical attributes, material types geometric properties the scene....
For object recognition, the current state-of-the-art is based on exhaustive search. However, to enable use of more expensive features and classifiers thereby progress beyond state-of-the-art, a selective search strategy needed. Therefore, we adapt segmentation as by reconsidering segmentation: We propose generate many approximate locations over few precise delineations because (1) an whose location never generated can not be recognised (2) appearance immediate nearby context are most...
Manually annotating object bounding boxes is central to building computer vision datasets, and it very time consuming (annotating ILSVRC [53] took 35s for one high-quality box [62]). It involves clicking on imaginary comers of a tight around the object. This difficult as these are often outside actual several adjustments required obtain box. We propose extreme instead: we ask annotator click four physical points object: top, bottom, left- right-most points. task more natural easy find....
The number of web images has been explosively growing due to the development network and storage technology. These make up a large amount current multimedia data are closely related our daily life. To efficiently browse, retrieve organize images, numerous approaches have proposed. Since semantic concepts can be indicated by label information, automatic image annotation becomes one effective technique for management tasks. Most existing methods use features that often noisy redundant. Hence,...
In this paper, we propose a novel semi-supervised feature analyzing framework for multimedia data understanding and apply it to three different applications: image annotation, video concept detection 3-D motion analysis. Our method is built upon two advancements of the state art: (1) <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</i> <sub xmlns:xlink="http://www.w3.org/1999/xlink">2, 1</sub> -norm regularized selection which can jointly select...
As datasets grow increasingly large in content-based image and video retrieval, computational efficiency of concept classification is important. This paper reviews techniques to accelerate classification, where we show the trade-off between accuracy. a basis, use Bag-of-Words algorithm that 2008 benchmarks TRECVID PASCAL lead best performance scores. We divide evaluation three steps: 1) Descriptor Extraction, evaluate SIFT, SURF, DAISY, Semantic Textons. 2) Visual Word Assignment, compare...
Training object class detectors typically requires a large set of images in which objects are annotated by boundingboxes. However, manually drawing bounding-boxes is very time consuming. We propose new scheme for training only annotators to verify produced automatically the learning algorithm. Our iterates between re-training detector, re-localizing images, and human verification. use verification signal both improve reduce search space re-localisation, makes these steps different what...
Training object class detectors typically requires a large set of images with objects annotated by bounding boxes. However, manually drawing boxes is very time consuming. In this paper we greatly reduce annotation proposing center-click annotations: ask annotators to click on the center an imaginary box which tightly encloses instance. We then incorporate these clicks into existing Multiple Instance Learning techniques for weakly supervised localization, jointly localize over all training...
We propose to revisit knowledge transfer for training object detectors on target classes from weakly supervised images, helped by a set of source with bounding-box annotations. present unified framework based single neural network multi-class detector over all classes, organized in semantic hierarchy. This generates proposals scores at multiple levels the hierarchy, which we use explore broad range generality, ranging class-specific (bycicle motorbike) class-generic (objectness any class)....
Transfer learning enables to re-use knowledge learned on a source task help target task. A simple form of transfer is common in current state-of-the-art computer vision models, i.e., pre-training model for image classification the ILSVRC dataset, and then fine-tune any However, previous systematic studies have been limited circumstances which it expected work are not fully understood. In this paper we carry out an extensive experimental exploration across vastly different domains (consumer...
Most artworks are explicitly created to evoke a strong emotional response. During the centuries there were several art movements which employed different techniques achieve expressions conveyed by artworks. Yet people always consistently able read messages even from most abstract paintings. Can machine learn what makes an artwork emotional? In this work, we consider set of 500 paintings Museum Modern and Contemporary Art Trento Rovereto (MART), where each painting was scored as carrying...
We address interactive full image annotation, where the goal is to accurately segment all object and stuff regions in an image. propose interactive, scribble-based annotation framework which operates on whole produce segmentations for regions. This enables sharing scribble corrections across regions, allows annotator focus largest errors made by machine To realize this, we adapt Mask-RCNN [22] into a fast segmentation introduce instance-aware loss measured at pixel-level canvas, lets...
We start from the state-of-the-art Bag of Words pipeline that in 2008 benchmarks TRECvid and PASCAL yielded best performance scores. have contributed to pipeline, which now forms basis compare various fast alternatives for all its components: (i) For descriptor extraction we propose a algorithm densely sample SIFT SURF, several variants these descriptors. (ii) projection k-means visual vocabulary with Random Forest. As preprojection step experiment PCA on descriptors decrease time. (iii)...
The explosive growth of digital images requires effective methods to manage these images. Among various existing methods, automatic image annotation has proved be an important technique for management tasks, e.g., retrieval over large-scale databases. Automatic been widely studied during recent years and a considerable number approaches have proposed. However, the performance is yet satisfactory, thus demanding more effort on research annotation. In this paper, we propose novel semi...