- Advanced Vision and Imaging
- Human Pose and Action Recognition
- Video Surveillance and Tracking Methods
- Robotics and Sensor-Based Localization
- 3D Shape Modeling and Analysis
- 3D Surveying and Cultural Heritage
- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Advanced Neural Network Applications
- Computer Graphics and Visualization Techniques
- Wood Treatment and Properties
- Neural dynamics and brain function
- Video Analysis and Summarization
- Archaeological Research and Protection
- Human Motion and Animation
- Photoreceptor and optogenetics research
- Domain Adaptation and Few-Shot Learning
- Computability, Logic, AI Algorithms
- Bamboo properties and applications
- Image Retrieval and Classification Techniques
- Memory and Neural Mechanisms
- Structural Load-Bearing Analysis
- Image Processing and 3D Reconstruction
- Image and Object Detection Techniques
Italian Institute of Technology
2021-2024
University of Surrey
2018-2019
We propose a CNN-based approach for multi-camera markerless motion capture of the human body. Unlike existing methods that first perform pose estimation on individual cameras and generate 3D models as post-processing, our makes use reasoning throughout multi-stage approach. This novelty allows us to provisional rethink where joints should be located in image recover from past mistakes. Our principled refinement poses lets make cues, even images we previously misdetected joints, refine...
This paper presents a novel generative approach that outputs 3D indoor environments solely from textual description of the scene. Current methods often treat scene synthesis as mere layout prediction task, leading to rooms with overlapping objects or overly structured scenes, limited consideration practical usability generated environment. Instead, our is based on simple, but effective principle: we condition generate are usable by humans. principle implemented synthesizing humans interact...
During the last years, timber-concrete composite (TCC) structures have been extensively used in Europe both new and existing buildings. Generally speaking, a structure combines advantages of materials employed: strength stiffness concrete compression tensile strength, lightweight, low embodied energy, aesthetical appearance timber. The slab provides protection timber beams from direct contact with water, which is crucial to ensure durability beams, particularly when for bridges. Different...
Robustly estimating camera poses from a set of images is fundamental task which remains challenging for differentiable methods, especially in the case small and sparse pose graphs. To overcome this challenge, we propose Pose-refined Rotation Averaging Graph Optimization (PRAGO). From objectness detections on unordered images, our method reconstructs rotational pose, turn, absolute manner benefiting optimization sequence geometrical tasks. We show how pose-refinement module PRAGO able to...
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and enabling 3D the scene. Recent works in novel-view synthesis have shown how to model appearance scene via cloud Gaussians, generate accurate images given by projecting on it Gaussians before $\alpha$ blending their color. Following this example, we train include also feature vector for each Gaussian. These can then be used segmentation, clustering according vectors; 2D...
Robustly estimating camera poses from a set of images is fundamental task which remains challenging for differentiable methods, especially in the case small and sparse pose graphs. To overcome this challenge, we propose Pose-refined Rotation Averaging Graph Optimization (PRAGO). From objectness detections on unordered images, our method reconstructs rotational pose, turn, absolute manner benefiting optimization sequence geometrical tasks. We show how pose-refinement module PRAGO able to...
The creation of digital replicas physical objects has valuable applications for the preservation and dissemination tangible cultural heritage. However, existing methods are often slow, expensive, require expert knowledge. We propose a pipeline to generate 3D replica scene using only RGB images (e.g. photos museum) then extract model each item interest pieces in exhibit). do this by leveraging advancements novel view synthesis Gaussian Splatting, modified enable efficient segmentation. This...
World-wide detailed 2D maps require enormous collective efforts. OpenStreetMap is the result of 11 million registered users manually annotating GPS location over 1.75 billion entries, including distinctive landmarks and common urban objects. At same time, manual annotations can include errors are slow to update, limiting map's accuracy. Maps from Motion (MfM) a step forward automatize such time-consuming map making procedure by computing semantic objects directly collection uncalibrated...
Efficient visual localization is crucial to many applications, such as large-scale deployment of autonomous agents and augmented reality. Traditional localization, while achieving remarkable accuracy, relies on extensive 3D models the scene or large collections geolocalized images, which are often inefficient store scale novel environments. In contrast, humans orient themselves using very abstract 2D maps, location clearly identifiable landmarks. Drawing this success recent works that...
We propose a CNN-based approach for multi-camera markerless motion capture of the human body. Unlike existing methods that first perform pose estimation on individual cameras and generate 3D models as post-processing, our makes use reasoning throughout multi-stage approach. This novelty allows us to provisional rethink where joints should be located in image recover from past mistakes. Our principled refinement poses lets make cues, even images we previously misdetected joints, refine...
The estimation of the camera poses associated with a set images commonly relies on feature matches between images. In contrast, we are first to address this challenge by using objectness regions guide pose problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) light-weight Graph Neural refine approximate pair-wise relative poses. PoserNet exploits associations - concisely expressed as bounding boxes across multiple views globally sparsely...