- Advanced Neural Network Applications
- Advanced Vision and Imaging
- Autonomous Vehicle Technology and Safety
- Industrial Vision Systems and Defect Detection
- Video Surveillance and Tracking Methods
- Infrared Target Detection Methodologies
- Generative Adversarial Networks and Image Synthesis
- Optical measurement and interference techniques
- Robotics and Sensor-Based Localization
- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Speech and dialogue systems
- Robotics and Automated Systems
- 3D Surveying and Cultural Heritage
- Domain Adaptation and Few-Shot Learning
- Image Processing Techniques and Applications
ETH Zurich
2024
National Tsing Hua University
2022-2023
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects planning the observer's actions in numerous applications such as autonomous driving. We propose a that can effectively associate moving over time estimate their full bounding box information from sequence 2D images captured on platform. The object association leverages quasi-dense similarity learning to identify various poses viewpoints with appearance cues only. After initial...
Accurate detection and tracking of surrounding objects is essential to enable self-driving vehicles. While Light Detection Ranging (LiDAR) sensors have set the benchmark for high performance, appeal camera-only solutions lies in their cost-effectiveness. Notably, despite prevalent use Radio (RADAR) automotive systems, potential 3D has been largely disregarded due data sparsity measurement noise. As a recent development, combination RADARs cameras emerging as promising solution. This paper...
To track the 3D locations and trajectories of other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover vehicle's full surroundings. Yet, camera-based object tracking methods prioritize optimizing single-camera setup resort to post-hoc fusion in a multi-camera setup. In this paper, we propose method for panoramic tracking, called CC-3DT, associates models both temporally across views, improves overall consistency. particular, our...
Aggregating information from features across different layers is essential for dense prediction models. Despite its limited expressiveness, vanilla feature concatenation dominates the choice of aggregation operations. In this paper, we introduce Attentive Feature Aggregation (AFA) to fuse network with more expressive non-linear AFA exploits both spatial and channel attention compute weighted averages layer activations. Inspired by neural volume rendering, further extend Scale-Space Rendering...
Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks in 3D perception and modeling. However, the remarkable accuracy of recent MMDE methods confined their training domains. These fail generalize unseen domains even presence moderate domain gaps, which hinders practical applicability. We propose a new model, UniDepth, capable reconstructing scenes from solely single images across Departing existing methods, UniDepth directly predicts points input image at...
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects planning the observer's actions in numerous applications such as autonomous driving. We propose a that can effectively associate moving over time estimate their full bounding box information from sequence 2D images captured on platform. The object association leverages quasi-dense similarity learning to identify various poses viewpoints with appearance cues only. After initial...
Open-vocabulary Multiple Object Tracking (MOT) aims to generalize trackers novel categories not in the training set. Currently, best-performing methods are mainly based on pure appearance matching. Due complexity of motion patterns large-vocabulary scenarios and unstable classification objects, semantics cues either ignored or applied heuristics final matching steps by existing methods. In this paper, we present a unified framework SLAck that jointly considers semantics, location, priors...
Multiple object tracking in complex scenarios - such as coordinated dance performances, team sports, or dynamic animal groups presents unique challenges. In these settings, objects frequently move patterns, occlude each other, and exhibit long-term dependencies their trajectories. However, it remains a key open research question on how to model long-range within tracklets, interdependencies among the associated temporal occlusions. To this end, we introduce Samba, novel linear-time...
Aggregating information from features across different layers is an essential operation for dense prediction models. Despite its limited expressiveness, feature concatenation dominates the choice of aggregation operations. In this paper, we introduce Attentive Feature Aggregation (AFA) to fuse network with more expressive non-linear AFA exploits both spatial and channel attention compute weighted average layer activations. Inspired by neural volume rendering, extend Scale-Space Rendering...