NFDI4DS | UHH-SEMS - Publication Details

Preformer MOT: A Transformer-based Approach for Multi-Object Tracking with Global Trajectory Prediction

OPENALEX - Publications

Yueying Wang Yuhao Qing Kaer Huang Chuangyin Dang Zhengtian Wu

10.1016/j.fmre.2024.06.015 article EN cc-by-nc-nd Fundamental Research 2025-01-01

Multi-Object Tracking by Self-supervised Learning Appearance Model

OPENALEX - Publications

Kaer Huang Kanokphan Lertniphonphan Feng Chen Jian Li Zhepeng Wang

In recent years, dominant multi-object tracking (MOT) and segmentation (MOTS) methods mainly follow the tracking-by-detection paradigm. Transformer-based end-to-end (E2E) solutions bring some ideas to MOT MOTS, but they can not achieve a new state-of-the-art (SOTA) performance in major MOTS bench-marks. Detection association are two main modules of Association techniques depend on combination motion appearance information. As deep learning has been recently developed, detection model is...

10.1109/cvprw59228.2023.00318 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023-06-01

ArtCrafter: Text-Image Aligning Style Transfer via Embedding Reframing

OPENALEX - Publications

Nisha Huang Kaer Huang Yifan Pu Juqi Wang Jie Guo and 2 more

Recent years have witnessed significant advancements in text-guided style transfer, primarily attributed to innovations diffusion models. These models excel conditional guidance, utilizing text or images direct the sampling process. However, despite their capabilities, guidance approaches often face challenges balancing expressiveness of textual semantics with diversity output results while capturing stylistic features. To address these challenges, we introduce ArtCrafter, a novel framework...

10.48550/arxiv.2501.02064 preprint EN arXiv (Cornell University) 2025-01-03

Integrating Low-Level Visual Cues for Enhanced Unsupervised Semantic Segmentation

OPENALEX - Publications

Yuhao Qing Dan Zeng Shaorong Xie Kaer Huang Yueying Wang

Unsupervised semantic segmentation algorithms aim to identify meaningful groups without annotations. Recent approaches leveraging self-supervised transformers as pre-training backbones have successfully obtained high-level dense features that effectively express coherence. However, these methods often overlook local coherence and low-level such color texture. We propose integrating visual cues complement derived from branches. Our findings indicate provide a more coherent recognition of...

10.1609/aaai.v39i6.32708 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

2nd Workshop on Maritime Computer Vision (MaCVi) 2024: Challenge Results

OPENALEX - Publications

Benjamin Kiefer Lojze Žust Matej Kristan Janez Perš Matija Teršek and 44 more

The 2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">nd</sup> Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Surface (USV). Three challenges categories are considered: (i) UAV-based Object Tracking with Re-ideruification, (ii) USV-based Obstacle Segmentation Detection, (iii) Boat Tracking. Detection features three sub-challenges, including a new embedded challenge...

10.1109/wacvw60836.2024.00099 article EN 2024-01-01

The Signal Integrity of the High-Speed IC Design

OPENALEX - Publications

Kaer Huang Wenyi Liu Yan Zhang Hongcheng Yan

With the continuous improvement of Embedded Microprocessor's main frequency, signal transmission and processing becoming faster faster, traditional method circuit design software will not be able to meet requirements high-speed design. However, an increasing number VLSI chips' work frequency has reached above 100MHZ, CPU with 450MHz also widely used. The edge is steeper (has ps), which makes system must face a variety integrity issues. This paper presents common issues, corresponding...

10.1109/wicom.2010.5600741 article EN 2010-09-01

Lane detection with position embedding

OPENALEX - Publications

Jun Xie Jiacheng Han Dezhen Qi Feng Chen Kaer Huang and 1 more

Recently, lane detection has made great progress in autonomous driving. RESA (REcurrent Feature-Shift Aggregator) is based on image segmentation. It presents a novel module to enrich feature after preliminary extraction with an ordinary CNN. For Tusimple dataset, there not too complicated scene and more prominent spatial features. On the basis of RESA, we introduce method position embedding enhance The experimental results show that this achieved best accuracy 96.93% dataset.

10.1117/12.2644351 article EN Fourteenth International Conference on Digital Image Processing (ICDIP 2022) 2022-10-12

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results

OPENALEX - Publications

Matej Kristan Jiřı́ Matas Martin Danelljan Michael Felsberg Hyung Jin Chang and 95 more

The Visual Object Tracking Segmentation VOTS2023 challenge is the eleventh annual tracker benchmarking activity of VOT initiative. This first to merge short-term and long-term as well single-target multiple-target tracking with segmentation masks only target location specification. A new dataset was created; ground truth has been withheld prevent overfitting. New performance measures evaluation protocols have created along a toolkit an server. Results presented 47 trackers indicate that...

10.1109/iccvw60793.2023.00195 article EN 2023-10-02

1st Place Solution for CVPR2023 BURST Long Tail and Open World Challenges

OPENALEX - Publications

Kaer Huang

Currently, Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories that contain only few dozen categories, lacking the ability to handle diverse real-world videos. As TAO BURST datasets release, we have opportunity research VIS long-tailed open-world scenarios. Traditional methods are evaluated on benchmarks limited small number common classes, But practical applications require trackers go beyond these detecting...

10.48550/arxiv.2308.04598 preprint EN cc-by arXiv (Cornell University) 2023-01-01

ReIDTrack: Multi-Object Track and Segmentation Without Motion

OPENALEX - Publications

Kaer Huang Bingchuan Sun Feng Chen Tao Zhang Jun Xie and 3 more

In recent years, dominant Multi-object tracking (MOT) and segmentation (MOTS) methods mainly follow the tracking-by-detection paradigm. Transformer-based end-to-end (E2E) solutions bring some ideas to MOT MOTS, but they cannot achieve a new state-of-the-art (SOTA) performance in major MOTS benchmarks. Detection association are two main modules of Association techniques depend on combination motion appearance information. As deep learning has been recently developed, detection model is...

10.48550/arxiv.2308.01622 preprint EN cc-by arXiv (Cornell University) 2023-01-01

ReIDTracker Sea: the technical report of BoaTrack and SeaDronesSee-MOT challenge at MaCVi of WACV24

OPENALEX - Publications

Kaer Huang Weitu Chong

Multi-Object Tracking is one of the most important technologies in maritime computer vision. Our solution tries to explore Unmanned Aerial vehicles (UAVs) and Surface Vehicles (USVs) usage scenarios. Most current algorithms require complex association strategies information (2D location motion, 3D depth, 2D appearance) achieve better performance, which makes entire tracking system extremely heavy. At same time, still video annotation data costly obtain for training. a completely unsupervised...

10.48550/arxiv.2311.07616 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024

OPENALEX - Publications

Benjamin Kiefer Lojze Žust Matej Kristan Janez Perš Matija Teršek and 44 more

The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Surface (USV). Three challenges categories are considered: (i) UAV-based Object Tracking with Re-identification, (ii) USV-based Obstacle Segmentation Detection, (iii) Boat Tracking. Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference real-world devices. This report offers comprehensive overview of the...

10.48550/arxiv.2311.14762 preprint EN cc-by arXiv (Cornell University) 2023-01-01

ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking

OPENALEX - Publications

Yushan Han Kaer Huang

Efficiently modeling spatio-temporal relations of objects is a key challenge in visual object tracking (VOT). Existing methods track by appearance-based similarity or long-term relation modeling, resulting rich temporal contexts between consecutive frames being easily overlooked. Moreover, training trackers from scratch fine-tuning large pre-trained models needs more time and memory consumption. In this paper, we present ACTrack, new framework with additive conditions. It preserves the...

10.48550/arxiv.2403.07914 preprint EN arXiv (Cornell University) 2024-02-27

ReIDTracker_Sea: Multi-Object Tracking in Maritime Computer Vision

OPENALEX - Publications

Kaer Huang Weitu Chong Hui Yang Kanokphan Lertniphonphan Jun Xie and 1 more

In recent years, we've witnessed the remarkable growth of computer vision applications in both maritime and fresh water domains. These technologies have played pivotal roles search rescue (SaR), detection illegal fishing, airborne surface reconnaissance, offshore wind farm oil rig inspections, animal population monitoring, beyond. Multi-Object Tracking is one most important vision. Our paper tries to explore Unmanned Aerial vehicles (UAVs) Surface Vehicles (USVs). Most current algorithms...

10.1109/wacvw60836.2024.00130 article EN 2024-01-01

PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge

OPENALEX - Publications

Feng Chen Ling Ding Kanokphan Lertniphonphan J. R. Li Kaer Huang and 1 more

This report presents our team's 'PCIE_EgoHandPose' solution for the EgoExo4D Hand Pose Challenge at CVPR2024. The main goal of challenge is to accurately estimate hand poses, which involve 21 3D joints, using an RGB egocentric video image provided task. task particularly challenging due subtle movements and occlusions. To handle complexity task, we propose Vision Transformer (HP-ViT). HP-ViT comprises a ViT backbone transformer head joint positions in 3D, utilizing MPJPE RLE loss function....

10.48550/arxiv.2406.12219 preprint EN arXiv (Cornell University) 2024-06-17

Technical Report for Argoverse Challenges on Unified Sensor-based Detection, Tracking, and Forecasting

OPENALEX - Publications

Zhepeng Wang Feng Chen Kanokphan Lertniphonphan Si-Wei Chen Jinyao Bao and 4 more

This report presents our Le3DE2E solution for unified sensor-based detection, tracking, and forecasting in Argoverse Challenges at CVPR 2023 Workshop on Autonomous Driving (WAD). We propose a network that incorporates three tasks, including forecasting. adopts strong Bird's Eye View (BEV) encoder with spatial temporal fusion generates representations multi-tasks. The was tested the 2 sensor dataset to evaluate of 26 object categories. achieved 1st place Detection, Tracking, Forecasting E2E track WAD.

10.48550/arxiv.2311.15615 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Lane detection with Position Embedding

OPENALEX - Publications

Jun Xie Jiacheng Han Dezhen Qi Feng Chen Kaer Huang and 1 more

Recently, lane detection has made great progress in autonomous driving. RESA (REcurrent Feature-Shift Aggregator) is based on image segmentation. It presents a novel module to enrich feature after preliminary extraction with an ordinary CNN. For Tusimple dataset, there not too complicated scene and more prominent spatial features. On the basis of RESA, we introduce method position embedding enhance The experimental results show that this achieved best accuracy 96.93% dataset.

10.48550/arxiv.2203.12301 preprint EN public-domain arXiv (Cornell University) 2022-01-01