- Generative Adversarial Networks and Image Synthesis
- 3D Shape Modeling and Analysis
- Multimodal Machine Learning Applications
- Advanced Vision and Imaging
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Computer Graphics and Visualization Techniques
- Image Processing Techniques and Applications
- Advanced Image Processing Techniques
- Face recognition and analysis
- Video Analysis and Summarization
- Domain Adaptation and Few-Shot Learning
- Optical measurement and interference techniques
- Robotics and Sensor-Based Localization
- Advanced Neural Network Applications
- Time Series Analysis and Forecasting
- Interactive and Immersive Displays
- Visual Attention and Saliency Detection
- Human Motion and Animation
- Video Surveillance and Tracking Methods
Huawei Technologies (Sweden)
2023-2024
Technical University of Munich
2019-2022
Scene understanding has been of high interest in computer vision. It encompasses not only identifying objects a scene, but also their relationships within the given context. With this goal, recent line works tackles 3D semantic segmentation and scene layout prediction. In our work we focus on graphs, data structure that organizes entities graph, where are nodes modeled as edges. We leverage inference graphs way to carry out understanding, mapping relationships. particular, propose learned...
Image manipulation can be considered a special case of image generation where the to produced is modification an existing image. and have been, for most part, tasks that operate on raw pixels. However, remarkable progress in learning rich object representations has opened way such as text-to-image or layout-to-image are mainly driven by semantics. In our work, we address novel problem from scene graphs, which user edit images merely applying changes nodes edges semantic graph generated Our...
Controllable scene synthesis consists of generating 3D information that satisfy underlying specifications. Thereby, these specifications should be abstract, i.e. allowing easy user interaction, whilst providing enough interface for detailed control. Scene graphs are representations a scene, composed objects (nodes) and inter-object relationships (edges), proven to particularly suited this task, as they allow semantic control on the generated content. Previous works tackling task often rely...
We present a method that tackles the challenge of predicting color and depth behind visible content an image. Our approach aims at building up Layered Depth Image (LDI) from single RGB input, which is efficient representation arranges scene in layers, including originally occluded regions. Unlike previous work, we enable adaptive scheme for number layers incorporate semantic encoding better hallucination partly objects. Additionally, our object-driven, especially boosts accuracy intermediate...
Despite recent advancements in single-domain or single-object image generation, it is still challenging to generate complex scenes containing diverse, multiple objects and their interactions. Scene graphs, composed of nodes as directed-edges relationships among objects, offer an alternative representation a scene that more semantically grounded than images. We hypothesize generative model for graphs might be able learn the underlying semantic structure real-world effectively images, hence,...
Interactions between human and objects are influenced not only by the object's pose shape, but also physical attributes such as object mass surface friction. They introduce important motion nuances that essential for diversity realism. Despite advancements in recent kinematics-based methods, this aspect has been overlooked. Generating nuanced presents two challenges. First, it is non-trivial to learn from multi-modal information derived both non-physical attributes. Second, there exists no...
This manuscript presents the results of "A View Synthesis Challenge for Humans Heads (VSCHH)", which was part ICCV 2023 workshops. paper describes competition setup and provides details on replicating our initial baseline, TensoRF. Additionally, we provide a summary participants' methods their in benchmark table. The challenge aimed to synthesize novel camera views human heads using given set sparse training view images. proposed solutions participants were evaluated ranked based objective...
Generation of images from scene graphs is a promising direction towards explicit generation and manipulation. However, the generated lack quality, which in part comes due to high difficulty diversity data. We propose MIGS (Meta Image Scene Graphs), meta-learning based approach for few-shot image that enables adapting model different scenes increases quality by training on diverse sets tasks. By sampling data task-driven fashion, we train generator using tasks are categorized attributes. Our...
This work addresses the problem of real-time rendering photorealistic human body avatars learned from multi-view videos. While classical approaches to model and render virtual humans generally use a textured mesh, recent research has developed neural representations that achieve impressive visual quality. However, these models are difficult in their quality degrades when character is animated with poses different than training observations. We propose an animatable based on 3D Gaussian...
3D head animation has seen major quality and runtime improvements over the last few years, particularly empowered by advances in differentiable rendering neural radiance fields. Real-time is a highly desirable goal for real-world applications. We propose HeadGaS, first model to use Gaussian Splats (3DGS) reconstruction animation. In this paper we introduce hybrid that extends explicit representation from 3DGS with base of learnable latent features, which can be linearly blended...
Novel view synthesis has shown rapid progress recently, with methods capable of producing evermore photo-realistic results. 3D Gaussian Splatting emerged as a particularly promising method, high-quality renderings static scenes and enabling interactive viewing at real-time frame rates. However, it is currently limited to only. In this work, we extend reconstruct dynamic scenes. We model the dynamics scene using tunable MLP, which learns deformation field from canonical space set Gaussians...
Graph representation of objects and their relations in a scene, known as scene graph, provides precise discernible interface to manipulate by modifying the nodes or edges graph. Although existing works have shown promising results placement pose objects, manipulation often leads losing some visual characteristics like appearance identity objects. In this work, we propose DisPositioNet, model that learns disentangled for each object task image using graphs self-supervised manner. Our...
We present a method that tackles the challenge of predicting color and depth behind visible content an image. Our approach aims at building up Layered Depth Image (LDI) from single RGB input, which is efficient representation arranges scene in layers, including originally occluded regions. Unlike previous work, we enable adaptive scheme for number layers incorporate semantic encoding better hallucination partly objects. Additionally, our object-driven, especially boosts accuracy intermediate...
Scene understanding has been of high interest in computer vision. It encompasses not only identifying objects a scene, but also their relationships within the given context. With this goal, recent line works tackles 3D semantic segmentation and scene layout prediction. In our work we focus on graphs, data structure that organizes entities graph, where are nodes modeled as edges. We leverage inference graphs way to carry out understanding, mapping relationships. particular, propose learned...
Image manipulation can be considered a special case of image generation where the to produced is modification an existing image. and have been, for most part, tasks that operate on raw pixels. However, remarkable progress in learning rich object representations has opened way such as text-to-image or layout-to-image are mainly driven by semantics. In our work, we address novel problem from scene graphs, which user edit images merely applying changes nodes edges semantic graph generated Our...
With the advent of deep learning, estimating depth from a single RGB image has recently received lot attention, being capable empowering many different applications ranging path planning for robotics to computational cinematography. Nevertheless,while maps are in their entirety fairly reliable, estimates around object discontinuities still far satisfactory. This can beattributed fact that convolutional operator naturally aggregates features across discontinuities, resulting smooth...