Yung-Hsu Yang

ORCID: 0000-0003-0044-515X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Vision and Imaging
  • Autonomous Vehicle Technology and Safety
  • Industrial Vision Systems and Defect Detection
  • Video Surveillance and Tracking Methods
  • Infrared Target Detection Methodologies
  • Generative Adversarial Networks and Image Synthesis
  • Optical measurement and interference techniques
  • Robotics and Sensor-Based Localization
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • Human Pose and Action Recognition
  • Speech and dialogue systems
  • Robotics and Automated Systems
  • 3D Surveying and Cultural Heritage
  • Domain Adaptation and Few-Shot Learning
  • Image Processing Techniques and Applications

ETH Zurich
2024

National Tsing Hua University
2022-2023

A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects planning the observer's actions in numerous applications such as autonomous driving. We propose a that can effectively associate moving over time estimate their full bounding box information from sequence 2D images captured on platform. The object association leverages quasi-dense similarity learning to identify various poses viewpoints with appearance cues only. After initial...

10.1109/tpami.2022.3168781 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-04-19

10.1109/cvpr52733.2024.00963 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Accurate detection and tracking of surrounding objects is essential to enable self-driving vehicles. While Light Detection Ranging (LiDAR) sensors have set the benchmark for high performance, appeal camera-only solutions lies in their cost-effectiveness. Notably, despite prevalent use Radio (RADAR) automotive systems, potential 3D has been largely disregarded due data sparsity measurement noise. As a recent development, combination RADARs cameras emerging as promising solution. This paper...

10.48550/arxiv.2403.15313 preprint EN arXiv (Cornell University) 2024-03-22

To track the 3D locations and trajectories of other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover vehicle's full surroundings. Yet, camera-based object tracking methods prioritize optimizing single-camera setup resort to post-hoc fusion in a multi-camera setup. In this paper, we propose method for panoramic tracking, called CC-3DT, associates models both temporally across views, improves overall consistency. particular, our...

10.48550/arxiv.2212.01247 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Aggregating information from features across different layers is essential for dense prediction models. Despite its limited expressiveness, vanilla feature concatenation dominates the choice of aggregation operations. In this paper, we introduce Attentive Feature Aggregation (AFA) to fuse network with more expressive non-linear AFA exploits both spatial and channel attention compute weighted averages layer activations. Inspired by neural volume rendering, further extend Scale-Space Rendering...

10.1109/wacv56688.2023.00018 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023-01-01

Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks in 3D perception and modeling. However, the remarkable accuracy of recent MMDE methods confined their training domains. These fail generalize unseen domains even presence moderate domain gaps, which hinders practical applicability. We propose a new model, UniDepth, capable reconstructing scenes from solely single images across Departing existing methods, UniDepth directly predicts points input image at...

10.48550/arxiv.2403.18913 preprint EN arXiv (Cornell University) 2024-03-27

A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects planning the observer's actions in numerous applications such as autonomous driving. We propose a that can effectively associate moving over time estimate their full bounding box information from sequence 2D images captured on platform. The object association leverages quasi-dense similarity learning to identify various poses viewpoints with appearance cues only. After initial...

10.48550/arxiv.2103.07351 preprint EN other-oa arXiv (Cornell University) 2021-01-01

10.1109/iros58592.2024.10801848 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024-10-14

Open-vocabulary Multiple Object Tracking (MOT) aims to generalize trackers novel categories not in the training set. Currently, best-performing methods are mainly based on pure appearance matching. Due complexity of motion patterns large-vocabulary scenarios and unstable classification objects, semantics cues either ignored or applied heuristics final matching steps by existing methods. In this paper, we present a unified framework SLAck that jointly considers semantics, location, priors...

10.48550/arxiv.2409.11235 preprint EN arXiv (Cornell University) 2024-09-17

Multiple object tracking in complex scenarios - such as coordinated dance performances, team sports, or dynamic animal groups presents unique challenges. In these settings, objects frequently move patterns, occlude each other, and exhibit long-term dependencies their trajectories. However, it remains a key open research question on how to model long-range within tracklets, interdependencies among the associated temporal occlusions. To this end, we introduce Samba, novel linear-time...

10.48550/arxiv.2410.01806 preprint EN arXiv (Cornell University) 2024-10-02

Aggregating information from features across different layers is an essential operation for dense prediction models. Despite its limited expressiveness, feature concatenation dominates the choice of aggregation operations. In this paper, we introduce Attentive Feature Aggregation (AFA) to fuse network with more expressive non-linear AFA exploits both spatial and channel attention compute weighted average layer activations. Inspired by neural volume rendering, extend Scale-Space Rendering...

10.48550/arxiv.2111.00770 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...