NFDI4DS | UHH-SEMS - Publication Details

Rareş Ambruş

ORCID: 0000-0002-3111-3812

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5037522398

Research Areas

Advanced Vision and Imaging
Robotics and Sensor-Based Localization
Optical measurement and interference techniques
Image Processing Techniques and Applications
Advanced Neural Network Applications
Advanced Image and Video Retrieval Techniques
Video Surveillance and Tracking Methods
3D Shape Modeling and Analysis
Computer Graphics and Visualization Techniques
Autonomous Vehicle Technology and Safety
Human Pose and Action Recognition
3D Surveying and Cultural Heritage
Multimodal Machine Learning Applications
Robot Manipulation and Learning
Advanced Optical Sensing Technologies
Robotic Path Planning Algorithms
Domain Adaptation and Few-Shot Learning
Industrial Vision Systems and Defect Detection
Advanced Image Processing Techniques
Image and Object Detection Techniques
Soft Robotics and Applications
Adversarial Robustness in Machine Learning
Tactile and Sensory Interactions
Reinforcement Learning in Robotics
Remote Sensing and LiDAR Applications

Toyota Research Institute
2018-2024

Toyota Industries (United States)
2019-2023

Toyota Motor Corporation (Switzerland)
2019-2021

KTH Royal Institute of Technology
2014-2020

Constructor University
2008

3D Packing for Self-Supervised Monocular Depth Estimation

OPENALEX - Publications

Vitor Guizilini Rareş Ambruş Sudeep Pillai Allan Raventos Adrien Gaidon

Although cameras are ubiquitous, robotic platforms typically rely on active sensors like LiDAR for direct 3D perception. In this work, we propose a novel self-supervised monocular depth estimation method combining geometry with new deep network, PackNet, learned only from unlabeled videos. Our architecture leverages symmetrical packing and unpacking blocks to jointly learn compress decompress detail-preserving representations using convolutions. self-supervised, our outperforms other self,...

10.1109/cvpr42600.2020.00256 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Is Pseudo-Lidar needed for Monocular 3D Object detection?

OPENALEX - Publications

Dennis Park Rareş Ambruş Vitor Guizilini Jie Li Adrien Gaidon

Recent progress in 3D object detection from single images leverages monocular depth estimation as a way to produce pointclouds, turning cameras into pseudo-lidar sensors. These two-stage detectors improve with the accuracy of intermediate network, which can itself be improved without manual labels via large-scale self-supervised learning. However, they tend suffer overfitting more than end-to-end methods, are complex, and gap similar lidar-based remains significant. In this work, we propose...

10.1109/iccv48922.2021.00313 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation

OPENALEX - Publications

Sudeep Pillai Rareş Ambruş Adrien Gaidon

Recent techniques in self-supervised monocular depth estimation are approaching the performance of supervised methods, but operate low resolution only. We show that high is key towards high-fidelity prediction. Inspired by recent deep learning methods for Single-Image Super-Resolution, we propose a subpixel convolutional layer extension super-resolution accurately synthesizes high-resolution disparities from their corresponding low-resolution features. In addition, introduce differentiable...

10.1109/icra.2019.8793621 article EN 2022 International Conference on Robotics and Automation (ICRA) 2019-05-01

Probabilistic 3D Multi-Modal, Multi-Object Tracking for Autonomous Driving

OPENALEX - Publications

Hsu-kuang Chiu Jie Li Rareş Ambruş Jeannette Bohg

Multi-object tracking is an important ability for autonomous vehicle to safely navigate a traffic scene. Current state-of-the-art follows the tracking-by-detection paradigm where existing tracks are associated with detected objects through some distance metric. Key challenges increase accuracy lie in data association and track life cycle management. We propose probabilistic, multi-modal, multiobject system consisting of different trainable modules provide robust data-driven results. First,...

10.1109/icra48506.2021.9561754 article EN 2021-05-30

Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

OPENALEX - Publications

Adam W. Harley Zhaoyuan Fang Jie Li Rareş Ambruş Katerina Fragkiadaki

Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense compared to cameras and other sensors. Recent has developed variety camera-only methods, where features are differentiably "lifted" from multi-camera images onto 2D ground plane, yielding "bird's eye view" (BEV) feature representation space around vehicle. This line work produced novel "lifting" but we observe details in training setups have...

10.1109/icra48891.2023.10160831 article EN 2023-05-29

The STRANDS Project: Long-Term Autonomy in Everyday Environments

OPENALEX - Publications

Nick Hawes Christopher Burbridge Ferdian Jovan Lars Kunze Bruno Lacerda and 28 more

Thanks to the efforts of robotics and autonomous systems community, robots are becoming ever more capable. There is also an increasing demand from end-users for service that can operate in real environments extended periods. In STRANDS project we tackling this head-on by integrating state-of-the-art artificial intelligence research into mobile robots, deploying these long-term installations security care environments. Over four deployments, our have been operational a combined duration 104...

10.1109/mra.2016.2636359 article EN IEEE Robotics & Automation Magazine 2017-06-14

Semantically-Guided Representation Learning for Self-Supervised Monocular Depth

OPENALEX - Publications

Vitor Guizilini Rui Hou Jie Li Rareş Ambruş Adrien Gaidon

Self-supervised learning is showing great promise for monocular depth estimation, using geometry as the only source of supervision. Depth networks are indeed capable representations that relate visual appearance to 3D properties by implicitly leveraging category-level patterns. In this work we investigate how leverage more directly semantic structure guide geometric representation learning, while remaining in self-supervised regime. Instead labels and proxy losses a multi-task approach,...

10.48550/arxiv.2002.12319 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Multi-Frame Self-Supervised Depth with Transformers

OPENALEX - Publications

Vitor Guizilini Rareş Ambruş Dian Chen Sergey Zakharov Adrien Gaidon

Multi-frame depth estimation improves over single-frame approaches by also leveraging geometric relationships between images via feature matching, in addition to learning appearance-based features. In this paper we revisit matching for self-supervised monocular estimation, and propose a novel transformer architecture cost volume generation. We use depth-discretized epipolar sampling select candidates, refine predictions through series of self- cross-attention layers. These layers sharpen the...

10.1109/cvpr52688.2022.00026 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion

OPENALEX - Publications

Vitor Guizilini Rareş Ambruş Wolfram Burgard Adrien Gaidon

Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars. In this paper, we study the problem of predicting dense depth a single RGB image (monodepth) optional sparse measurements low-cost active sensors. We introduce Sparse Auxiliary Networks (SANs), new module enabling monodepth networks to perform both tasks prediction completion, depending on whether only images or also point clouds are available at inference time. First, decouple...

10.1109/cvpr46437.2021.01093 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Meta-rooms: Building and maintaining long term spatial models in a dynamic world

OPENALEX - Publications

Rareş Ambruş Nils Bore John Folkesson Patric Jensfelt

We present a novel method for re-creating the static structure of cluttered office environments - which we define as "meta-room" from multiple observations collected by an autonomous robot equipped with RGB-D depth camera over extended periods time. Our works directly point clusters identifying what has changed one observation to next, removing dynamic elements and at same time adding previously occluded objects reconstruct underlying accurately possible. The process constructing meta-rooms...

10.1109/iros.2014.6942806 article EN 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems 2014-09-01

Automatic Room Segmentation From Unstructured 3-D Data of Indoor Environments

OPENALEX - Publications

Rareş Ambruş Sebastian Claici Axel Wendt

We present an automatic approach for the task of reconstructing a 2-D floor plan from unstructured point clouds building interiors. Our emphasizes accurate and robust detection structural elements and, unlike previous approaches, does not require prior knowledge scanning device poses. The reconstruction is formulated as multiclass labeling problem that we using energy minimization. use intuitive priors to define costs minimization rely on wall opening algorithms ensure robustness. provide...

10.1109/lra.2017.2651939 article EN IEEE Robotics and Automation Letters 2017-01-11

Autonomous Learning of Object Models on a Mobile Robot

OPENALEX - Publications

Thomas Fäulhammer Rareş Ambruş Chris Burbridge Michael Zillich John Folkesson and 3 more

In this article, we present and evaluate a system, which allows mobile robot to autonomously detect, model, re-recognize objects in everyday environments. While other systems have demonstrated one of these elements, our knowledge, the first is capable doing all things, without human interaction, normal indoor scenes. Our system detects learn by modeling static part environment extracting dynamic elements. It then creates executes view plan around element gather additional views for learning....

10.1109/lra.2016.2522086 article EN IEEE Robotics and Automation Letters 2016-01-26

Learning Optical Flow, Depth, and Scene Flow Without Real-World Labels

OPENALEX - Publications

Vitor Guizilini Kuan-Hui Lee Rareş Ambruş Adrien Gaidon

Self-supervised monocular depth estimation enables robots to learn 3D perception from raw video streams. This scalable approach leverages projective geometry and ego-motion via view synthesis, assuming the world is mostly static. Dynamic scenes, which are common in autonomous driving human-robot interaction, violate this assumption. Therefore, they require modeling dynamic objects explicitly, for instance estimating pixel-wise motion, i.e. scene flow. However, simultaneous self-supervised...

10.1109/lra.2022.3145057 article EN IEEE Robotics and Automation Letters 2022-01-25

Towards Zero-Shot Scale-Aware Monocular Depth Estimation

OPENALEX - Publications

Vitor Guizilini Igor Vasiljevic Dian Chen Rareş Ambruş Adrien Gaidon

Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to produce metric predictions. Even so, the resulting models will be geometry-specific, with learned scales that cannot directly transferred across domains. Because of that, recent works focus instead on relative depth, eschewing in favor improved up-to-scale zero-shot transfer. In this work we introduce ZeroDepth, a novel monocular framework capable predicting for arbitrary test images from different domains...

10.1109/iccv51070.2023.00847 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes

OPENALEX - Publications

Muhammad Zubair Irshad Sergey Zakharov Katherine Liu Vitor Guizilini Thomas Kollar and 3 more

Recent implicit neural representations have shown great results for novel view synthesis. However, existing methods require expensive per-scene optimization from many views hence limiting their application to real-world unbounded urban settings where the objects of interest or backgrounds are observed very few views. To mitigate this challenge, we introduce a new approach called NeO 360, Neural fields sparse synthesis outdoor scenes. 360 is generalizable method that reconstructs 360° scenes...

10.1109/iccv51070.2023.00843 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects

OPENALEX - Publications

Mayank Lunayach Sergey Zakharov Dian Chen Rareş Ambruş Zsolt Kira and 1 more

10.1109/icra57147.2024.10611012 article EN 2024-05-13

Full Surround Monodepth From Multiple Cameras

OPENALEX - Publications

Vitor Guizilini Igor Vasiljevic Rareş Ambruş Greg Shakhnarovich Adrien Gaidon

Self-supervised monocular depth and ego-motion estimation is a promising approach to replace or supplement expensive sensors such as LiDAR for robotics applications like autonomous driving. However, most research in this area focuses on single camera stereo pairs that cover only fraction of the scene around vehicle. In work, we extend self-supervised large-baseline multi-camera rigs. Using generalized spatio-temporal contexts, pose consistency constraints, carefully designed photometric loss...

10.1109/lra.2022.3150884 article EN IEEE Robotics and Automation Letters 2022-02-14

CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects

OPENALEX - Publications

Nick Heppert Muhammad Zubair Irshad Sergey Zakharov Katherine Liu Rareş Ambruş and 3 more

We present CARTO, a novel approach for reconstructing multiple articulated objects from single stereo RGB observation. use implicit object-centric representations and learn geometry articulation decoder object categories. Despite training on categories, our achieves comparable reconstruction accuracy to methods that train bespoke decoders separately each category. Combined with image encoder we infer the 3D shape, 6D pose, size, joint type, state of unknown in forward pass. Our method 20.4%...

10.1109/cvpr52729.2023.02031 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Geometric Unsupervised Domain Adaptation for Semantic Segmentation

OPENALEX - Publications

Vitor Guizilini Jie Li Rareş Ambruş Adrien Gaidon

Simulators can efficiently generate large amounts of labeled synthetic data with perfect supervision for hard-to-label tasks like semantic segmentation. However, they introduce a domain gap that severely hurts real-world performance. We propose to use self-supervised monocular depth estimation as proxy task bridge this and improve sim-to-real unsupervised adaptation (UDA). Our Geometric Unsupervised Domain Adaptation method (GUDA) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/iccv48922.2021.00842 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Viewpoint Equivariance for Multi-View 3D Object Detection

OPENALEX - Publications

Dian Chen Jie Li Vitor Guizilini Rareş Ambruş Adrien Gaidon

3D object detection from visual sensors is a corner-stone capability of robotic systems. State-of-the-art methods focus on reasoning and decoding bounding boxes multi-view camera input. In this work we gain intuition the integral role consistency in scene understanding geometric learning. To end, introduce VEDet, novel framework that exploits geometry to improve localization through viewpoint awareness equivariance. VEDet leverages query-based transformer architecture encodes by augmenting...

10.1109/cvpr52729.2023.00889 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion

OPENALEX - Publications

Vitor Guizilini Muhammad Zubair Irshad Dian Chen Greg Shakhnarovich Rareş Ambruş

Current methods for 3D scene reconstruction from sparse posed images employ intermediate representations such as neural fields, voxel grids, or Gaussians, to achieve multi-view consistent appearance and geometry. In this paper we introduce MVGD, a diffusion-based architecture capable of direct pixel-level generation depth maps novel viewpoints, given an arbitrary number input views. Our method uses raymap conditioning both augment visual features with spatial information different well guide...

10.48550/arxiv.2501.18804 preprint EN arXiv (Cornell University) 2025-01-30

Coming Soon ...