- Advanced Vision and Imaging
- Human Pose and Action Recognition
- 3D Shape Modeling and Analysis
- Robotics and Sensor-Based Localization
- Advanced Image and Video Retrieval Techniques
- Video Surveillance and Tracking Methods
- Human Motion and Animation
- Computer Graphics and Visualization Techniques
- Multimodal Machine Learning Applications
- Generative Adversarial Networks and Image Synthesis
- Advanced Neural Network Applications
- Optical measurement and interference techniques
- Video Analysis and Summarization
- Hand Gesture Recognition Systems
- Face recognition and analysis
- Robot Manipulation and Learning
- Advanced Image Processing Techniques
- Image Retrieval and Classification Techniques
- Image and Object Detection Techniques
- Anomaly Detection Techniques and Applications
- Medical Image Segmentation Techniques
- Gait Recognition and Analysis
- Cell Image Analysis Techniques
- Image Enhancement Techniques
- Handwritten Text Recognition Techniques
Universitat Politècnica de Catalunya
2016-2025
Institut de Robòtica i Informàtica Industrial
2016-2025
Consejo Superior de Investigaciones Científicas
2010-2023
Max Planck Institute for Informatics
2023
University of Tübingen
2023
Université de Bordeaux
2018
University of Surrey
2017
Imperial College London
2017
Waseda University
2017
Unidades Centrales Científico-Técnicas
2017
Deep learning has revolutionalized image-level tasks such as classification, but patch-level tasks, correspondence, still rely on hand-crafted features, e.g. SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. We deal the large number potential combination stochastic sampling training set an aggressive mining strategy biased towards patches that are...
Neural rendering techniques combining machine learning with geometric reasoning have arisen as one of the most promising approaches for synthesizing novel views a scene from sparse set images. Among these, stands out radiance fields (NeRF) [31], which trains deep network to map 5D input coordinates (representing spatial location and viewing direction) into volume density view-dependent emitted radiance. However, despite achieving an unprecedented level photorealism on generated images, NeRF...
This paper addresses the problem of 3D human pose estimation from a single image. We follow standard two-step pipeline by first detecting 2D position N body joints, and then using these observations to infer pose. For step, we use recent CNN-based detector. second most existing approaches perform 2N-to-3N regression Cartesian joint coordinates. show that more precise estimates can be obtained representing both poses NxN distance matrices, formulating as 2D-to-3D matrix regression. learning...
Low textured scenes are well known to be one of the main Achilles heels geometric computer vision algorithms relying on point correspondences, and in particular for visual SLAM. Yet, there many environments which, despite being low textured, can still reliably estimate line-based primitives, instance city indoor scenes, or so-called "Manhattan worlds", where structured edges predominant. In this paper we propose a solution handle these situations. Specifically, build upon ORB-SLAM,...
In this paper, we analyze the fashion of clothing a large social website. Our goal is to learn and predict how fashionable person looks on photograph suggest subtle improvements user could make improve her/his appeal. We propose Conditional Random Field model that jointly reasons about several fashionability factors such as type outfit garments wearing, user, photograph's setting (e.g., scenery behind user), score. Importantly, our able give rich feedback back conveying which or even she/he...
We present a novel approach for synthesizing photorealistic images of people in arbitrary poses using generative adversarial learning. Given an input image person and desired pose represented by 2D skeleton, our model renders the same under new pose, views parts visible hallucinating those that are not seen. This problem has recently been addressed supervised manner [16, 35], i.e., during training ground truth given to network. go beyond these approaches proposing fully unsupervised...
In this paper we introduce SMPLicit, a novel generative model to jointly represent body pose, shape and clothing geometry. contrast existing learning-based approaches that require training specific models for each type of garment, SMPLicit can in unified manner different garment topologies (e.g. from sleeveless tops hoodies open jackets), while controlling other properties like the size or tightness/looseness. We show our be applicable large variety garments including T-shirts, hoodies,...
The rise of deep learning has brought remarkable progress in estimating hand geometry from images where the hands are part scene. This paper focuses on a new problem not explored so far, consisting predicting how human would grasp one or several objects, given single RGB image these objects. is with enormous potential e.g. augmented reality, robotics prosthetic design. In order to predict feasible grasps, we need understand semantic content image, its geometric structure and all interactions...
This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences. State-of-the-art approaches provide good results, however, they rely on deep learning architectures arbitrary complexity, such as Recurrent Neural Networks(RNN), Transformers or Graph Convolutional Networks(GCN), typically requiring multiple training stages and more than 2 million parameters. In this paper, we show that, after combining with a series...
We propose a non-iterative solution to the PnP problem-the estimation of pose calibrated camera from n 3D-to-2D point correspondences—whose computational complexity grows linearly with 𝑛<. This is in contrast state-of-the-art methods that are 𝑂(𝑛 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5</sup> ) or even xmlns:xlink="http://www.w3.org/1999/xlink">8</sup> ), without being more accurate. Our method applicable for all 𝑛≥4 and handles...
We propose a real-time, robust to outliers and accurate solution the Perspective-n-Point (PnP) problem. The main advantages of our are twofold: first, it in- tegrates outlier rejection within pose estimation pipeline with negligible computational overhead, sec- ond, its scalability arbitrarily large number correspon- dences. Given set 3D-to-2D matches, we formulate problem as low-rank homogeneous sys- tem where lies on 1D null space. Outlier correspondences those rows linear system which...
We introduce a novel approach to automatically recover 3D human pose from single image. Most previous work follows pipelined approach: initially, set of 2D features such as edges, joints or silhouettes are detected in the image, and then these observations used infer pose. Solving two problems separately may lead erroneous poses when feature detector has performed poorly. In this paper, we address issue by jointly solving both detection inference problems. For purpose, propose Bayesian...
Markerless 3D human pose detection from a single image is severely underconstrained problem because different poses can have similar projections. In order to handle this ambiguity, current approaches rely on prior shape models that only be correctly adjusted if 2D features are accurately detected. Unfortunately, although part detector algorithms shown promising results, they not yet accurate enough guarantee complete disambiguation of the inferred shape. paper, we introduce novel approach...
We propose a novel approach for the estimation of pose and focal length camera from set 3D-to-2D point correspondences. Our method compares favorably to competing approaches in that it is both more accurate than existing closed form solutions, as well faster also iterative ones. inspired on EPnP algorithm, recent O(n) solution calibrated case. Yet we show considering an additional unknown renders linearization relinearization techniques original no longer valid, especially with large amounts...
Detecting grasping points is a key problem in cloth manipulation. Most current approaches follow multiple re-grasp strategy for this purpose, which clothes are sequentially grasped from different until one of them yields to desired configuration. In paper, by contrast, we circumvent the need re-graspings building robust detector that identifies points, generally single step, even when highly wrinkled. order handle large variability deformed may have, build Bag Features based combines...
The problem of predicting human motion given a sequence past observations is at the core many applications in robotics and computer vision. Current state-of-the-art formulate this as sequence-to-sequence task, which historical 3D skeletons feeds Recurrent Neural Network (RNN) that predicts future movements, typically order 1 to 2 seconds. However, one aspect has been obviated so far, fact inherently driven by interactions with objects and/or other humans environment. In paper, we explore...
Flow-based generative models have highly desirable properties like exact log-likelihood evaluation and latent-variable inference, however they are still in their infancy not received as much attention alternative models. In this paper, we introduce C-Flow, a novel conditioning scheme that brings normalizing flows to an entirely new scenario with great possibilities for multimodal data modeling. C-Flow is based on parallel sequence of invertible mappings which source flow guides the target at...
Recent learning approaches that implicitly represent surface geometry using coordinate-based neural representations have shown impressive results in the problem of multi-view 3D reconstruction. The effectiveness these techniques is, however, subject to availability a large number (several tens) input views scene, and computationally demanding optimizations. In this paper, we tackle limitations for specific few-shot full head reconstruction, by endowing with probabilistic shape prior enables...
This paper proposes a do-it-all neural model of human hands, named LISA. The can capture accurate hand shape and appearance, generalize to arbitrary sub-jects, provide dense surface correspondences, be reconstructed from images in the wild, easily an-imated. We train LISA by minimizing appearance losses on large set multi-view RGB image se-quences annotated with coarse 3D poses skele-ton. For point local coordinates, our predicts color signed distance respect each bone independently, then...
Human motion prediction aims to forecast future poses given a sequence of past 3D skeletons. While this problem has recently received increasing attention, it mostly been tackled for single humans in isolation. In paper, we explore when dealing with performing collaborative tasks, seek predict the two interacted persons sequences their We propose novel cross interaction attention mechanism that exploits historical information both persons, and learns dependencies between pose sequences....