Helisa Dhamo

ORCID: 0000-0003-1163-7448
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • 3D Shape Modeling and Analysis
  • Multimodal Machine Learning Applications
  • Advanced Vision and Imaging
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Computer Graphics and Visualization Techniques
  • Image Processing Techniques and Applications
  • Advanced Image Processing Techniques
  • Face recognition and analysis
  • Video Analysis and Summarization
  • Domain Adaptation and Few-Shot Learning
  • Optical measurement and interference techniques
  • Robotics and Sensor-Based Localization
  • Advanced Neural Network Applications
  • Time Series Analysis and Forecasting
  • Interactive and Immersive Displays
  • Visual Attention and Saliency Detection
  • Human Motion and Animation
  • Video Surveillance and Tracking Methods

Huawei Technologies (Sweden)
2023-2024

Technical University of Munich
2019-2022

Scene understanding has been of high interest in computer vision. It encompasses not only identifying objects a scene, but also their relationships within the given context. With this goal, recent line works tackles 3D semantic segmentation and scene layout prediction. In our work we focus on graphs, data structure that organizes entities graph, where are nodes modeled as edges. We leverage inference graphs way to carry out understanding, mapping relationships. particular, propose learned...

10.1109/cvpr42600.2020.00402 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

10.1109/cvpr52733.2024.00081 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Image manipulation can be considered a special case of image generation where the to produced is modification an existing image. and have been, for most part, tasks that operate on raw pixels. However, remarkable progress in learning rich object representations has opened way such as text-to-image or layout-to-image are mainly driven by semantics. In our work, we address novel problem from scene graphs, which user edit images merely applying changes nodes edges semantic graph generated Our...

10.1109/cvpr42600.2020.00526 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Controllable scene synthesis consists of generating 3D information that satisfy underlying specifications. Thereby, these specifications should be abstract, i.e. allowing easy user interaction, whilst providing enough interface for detailed control. Scene graphs are representations a scene, composed objects (nodes) and inter-object relationships (edges), proven to particularly suited this task, as they allow semantic control on the generated content. Previous works tackling task often rely...

10.1109/iccv48922.2021.01604 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

We present a method that tackles the challenge of predicting color and depth behind visible content an image. Our approach aims at building up Layered Depth Image (LDI) from single RGB input, which is efficient representation arranges scene in layers, including originally occluded regions. Unlike previous work, we enable adaptive scheme for number layers incorporate semantic encoding better hallucination partly objects. Additionally, our object-driven, especially boosts accuracy intermediate...

10.1109/iccv.2019.00547 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Despite recent advancements in single-domain or single-object image generation, it is still challenging to generate complex scenes containing diverse, multiple objects and their interactions. Scene graphs, composed of nodes as directed-edges relationships among objects, offer an alternative representation a scene that more semantically grounded than images. We hypothesize generative model for graphs might be able learn the underlying semantic structure real-world effectively images, hence,...

10.1109/iccv48922.2021.01605 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Interactions between human and objects are influenced not only by the object's pose shape, but also physical attributes such as object mass surface friction. They introduce important motion nuances that essential for diversity realism. Despite advancements in recent kinematics-based methods, this aspect has been overlooked. Generating nuanced presents two challenges. First, it is non-trivial to learn from multi-modal information derived both non-physical attributes. Second, there exists no...

10.48550/arxiv.2403.11237 preprint EN arXiv (Cornell University) 2024-03-17

This manuscript presents the results of "A View Synthesis Challenge for Humans Heads (VSCHH)", which was part ICCV 2023 workshops. paper describes competition setup and provides details on replicating our initial baseline, TensoRF. Additionally, we provide a summary participants' methods their in benchmark table. The challenge aimed to synthesize novel camera views human heads using given set sparse training view images. proposed solutions participants were evaluated ranked based objective...

10.1109/iccvw60793.2023.00120 article EN 2023-10-02

Generation of images from scene graphs is a promising direction towards explicit generation and manipulation. However, the generated lack quality, which in part comes due to high difficulty diversity data. We propose MIGS (Meta Image Scene Graphs), meta-learning based approach for few-shot image that enables adapting model different scenes increases quality by training on diverse sets tasks. By sampling data task-driven fashion, we train generator using tasks are categorized attributes. Our...

10.48550/arxiv.2110.11918 preprint EN other-oa arXiv (Cornell University) 2021-01-01

This work addresses the problem of real-time rendering photorealistic human body avatars learned from multi-view videos. While classical approaches to model and render virtual humans generally use a textured mesh, recent research has developed neural representations that achieve impressive visual quality. However, these models are difficult in their quality degrades when character is animated with poses different than training observations. We propose an animatable based on 3D Gaussian...

10.48550/arxiv.2311.17113 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

3D head animation has seen major quality and runtime improvements over the last few years, particularly empowered by advances in differentiable rendering neural radiance fields. Real-time is a highly desirable goal for real-world applications. We propose HeadGaS, first model to use Gaussian Splats (3DGS) reconstruction animation. In this paper we introduce hybrid that extends explicit representation from 3DGS with base of learnable latent features, which can be linearly blended...

10.48550/arxiv.2312.02902 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Novel view synthesis has shown rapid progress recently, with methods capable of producing evermore photo-realistic results. 3D Gaussian Splatting emerged as a particularly promising method, high-quality renderings static scenes and enabling interactive viewing at real-time frame rates. However, it is currently limited to only. In this work, we extend reconstruct dynamic scenes. We model the dynamics scene using tunable MLP, which learns deformation field from canonical space set Gaussians...

10.48550/arxiv.2312.13308 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Graph representation of objects and their relations in a scene, known as scene graph, provides precise discernible interface to manipulate by modifying the nodes or edges graph. Although existing works have shown promising results placement pose objects, manipulation often leads losing some visual characteristics like appearance identity objects. In this work, we propose DisPositioNet, model that learns disentangled for each object task image using graphs self-supervised manner. Our...

10.48550/arxiv.2211.05499 preprint EN other-oa arXiv (Cornell University) 2022-01-01

We present a method that tackles the challenge of predicting color and depth behind visible content an image. Our approach aims at building up Layered Depth Image (LDI) from single RGB input, which is efficient representation arranges scene in layers, including originally occluded regions. Unlike previous work, we enable adaptive scheme for number layers incorporate semantic encoding better hallucination partly objects. Additionally, our object-driven, especially boosts accuracy intermediate...

10.48550/arxiv.1908.09521 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Scene understanding has been of high interest in computer vision. It encompasses not only identifying objects a scene, but also their relationships within the given context. With this goal, recent line works tackles 3D semantic segmentation and scene layout prediction. In our work we focus on graphs, data structure that organizes entities graph, where are nodes modeled as edges. We leverage inference graphs way to carry out understanding, mapping relationships. particular, propose learned...

10.48550/arxiv.2004.03967 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Image manipulation can be considered a special case of image generation where the to produced is modification an existing image. and have been, for most part, tasks that operate on raw pixels. However, remarkable progress in learning rich object representations has opened way such as text-to-image or layout-to-image are mainly driven by semantics. In our work, we address novel problem from scene graphs, which user edit images merely applying changes nodes edges semantic graph generated Our...

10.48550/arxiv.2004.03677 preprint EN other-oa arXiv (Cornell University) 2020-01-01

With the advent of deep learning, estimating depth from a single RGB image has recently received lot attention, being capable empowering many different applications ranging path planning for robotics to computational cinematography. Nevertheless,while maps are in their entirety fairly reliable, estimates around object discontinuities still far satisfactory. This can beattributed fact that convolutional operator naturally aggregates features across discontinuities, resulting smooth...

10.1109/lra.2022.3155823 article EN IEEE Robotics and Automation Letters 2022-03-03
Coming Soon ...