Kaer Huang

ORCID: 0009-0003-5728-0058
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Video Surveillance and Tracking Methods
  • Infrared Target Detection Methodologies
  • Autonomous Vehicle Technology and Safety
  • Visual Attention and Saliency Detection
  • Robotic Path Planning Algorithms
  • Air Quality Monitoring and Forecasting
  • Advanced Image and Video Retrieval Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Remote-Sensing Image Classification
  • Vehicle License Plate Recognition
  • Advanced Vision and Imaging
  • Underwater Vehicles and Communication Systems
  • Domain Adaptation and Few-Shot Learning
  • Smart Grid and Power Systems
  • Maritime Navigation and Safety
  • VLSI and Analog Circuit Testing
  • Robotic Mechanisms and Dynamics
  • Embedded Systems and FPGA Applications
  • Image Processing and 3D Reconstruction
  • Robot Manipulation and Learning
  • Handwritten Text Recognition Techniques
  • Embedded Systems and FPGA Design
  • Technology and Security Systems

Lenovo (China)
2022-2024

North University of China
2009-2010

In recent years, dominant multi-object tracking (MOT) and segmentation (MOTS) methods mainly follow the tracking-by-detection paradigm. Transformer-based end-to-end (E2E) solutions bring some ideas to MOT MOTS, but they can not achieve a new state-of-the-art (SOTA) performance in major MOTS bench-marks. Detection association are two main modules of Association techniques depend on combination motion appearance information. As deep learning has been recently developed, detection model is...

10.1109/cvprw59228.2023.00318 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023-06-01

Recent years have witnessed significant advancements in text-guided style transfer, primarily attributed to innovations diffusion models. These models excel conditional guidance, utilizing text or images direct the sampling process. However, despite their capabilities, guidance approaches often face challenges balancing expressiveness of textual semantics with diversity output results while capturing stylistic features. To address these challenges, we introduce ArtCrafter, a novel framework...

10.48550/arxiv.2501.02064 preprint EN arXiv (Cornell University) 2025-01-03

Unsupervised semantic segmentation algorithms aim to identify meaningful groups without annotations. Recent approaches leveraging self-supervised transformers as pre-training backbones have successfully obtained high-level dense features that effectively express coherence. However, these methods often overlook local coherence and low-level such color texture. We propose integrating visual cues complement derived from branches. Our findings indicate provide a more coherent recognition of...

10.1609/aaai.v39i6.32708 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

The 2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">nd</sup> Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Surface (USV). Three challenges categories are considered: (i) UAV-based Object Tracking with Re-ideruification, (ii) USV-based Obstacle Segmentation Detection, (iii) Boat Tracking. Detection features three sub-challenges, including a new embedded challenge...

10.1109/wacvw60836.2024.00099 article EN 2024-01-01

With the continuous improvement of Embedded Microprocessor's main frequency, signal transmission and processing becoming faster faster, traditional method circuit design software will not be able to meet requirements high-speed design. However, an increasing number VLSI chips' work frequency has reached above 100MHZ, CPU with 450MHz also widely used. The edge is steeper (has ps), which makes system must face a variety integrity issues. This paper presents common issues, corresponding...

10.1109/wicom.2010.5600741 article EN 2010-09-01

Recently, lane detection has made great progress in autonomous driving. RESA (REcurrent Feature-Shift Aggregator) is based on image segmentation. It presents a novel module to enrich feature after preliminary extraction with an ordinary CNN. For Tusimple dataset, there not too complicated scene and more prominent spatial features. On the basis of RESA, we introduce method position embedding enhance The experimental results show that this achieved best accuracy 96.93% dataset.

10.1117/12.2644351 article EN Fourteenth International Conference on Digital Image Processing (ICDIP 2022) 2022-10-12

The Visual Object Tracking Segmentation VOTS2023 challenge is the eleventh annual tracker benchmarking activity of VOT initiative. This first to merge short-term and long-term as well single-target multiple-target tracking with segmentation masks only target location specification. A new dataset was created; ground truth has been withheld prevent overfitting. New performance measures evaluation protocols have created along a toolkit an server. Results presented 47 trackers indicate that...

10.1109/iccvw60793.2023.00195 article EN 2023-10-02

Currently, Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories that contain only few dozen categories, lacking the ability to handle diverse real-world videos. As TAO BURST datasets release, we have opportunity research VIS long-tailed open-world scenarios. Traditional methods are evaluated on benchmarks limited small number common classes, But practical applications require trackers go beyond these detecting...

10.48550/arxiv.2308.04598 preprint EN cc-by arXiv (Cornell University) 2023-01-01

In recent years, dominant Multi-object tracking (MOT) and segmentation (MOTS) methods mainly follow the tracking-by-detection paradigm. Transformer-based end-to-end (E2E) solutions bring some ideas to MOT MOTS, but they cannot achieve a new state-of-the-art (SOTA) performance in major MOTS benchmarks. Detection association are two main modules of Association techniques depend on combination motion appearance information. As deep learning has been recently developed, detection model is...

10.48550/arxiv.2308.01622 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Multi-Object Tracking is one of the most important technologies in maritime computer vision. Our solution tries to explore Unmanned Aerial vehicles (UAVs) and Surface Vehicles (USVs) usage scenarios. Most current algorithms require complex association strategies information (2D location motion, 3D depth, 2D appearance) achieve better performance, which makes entire tracking system extremely heavy. At same time, still video annotation data costly obtain for training. a completely unsupervised...

10.48550/arxiv.2311.07616 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Surface (USV). Three challenges categories are considered: (i) UAV-based Object Tracking with Re-identification, (ii) USV-based Obstacle Segmentation Detection, (iii) Boat Tracking. Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference real-world devices. This report offers comprehensive overview of the...

10.48550/arxiv.2311.14762 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Efficiently modeling spatio-temporal relations of objects is a key challenge in visual object tracking (VOT). Existing methods track by appearance-based similarity or long-term relation modeling, resulting rich temporal contexts between consecutive frames being easily overlooked. Moreover, training trackers from scratch fine-tuning large pre-trained models needs more time and memory consumption. In this paper, we present ACTrack, new framework with additive conditions. It preserves the...

10.48550/arxiv.2403.07914 preprint EN arXiv (Cornell University) 2024-02-27

In recent years, we've witnessed the remarkable growth of computer vision applications in both maritime and fresh water domains. These technologies have played pivotal roles search rescue (SaR), detection illegal fishing, airborne surface reconnaissance, offshore wind farm oil rig inspections, animal population monitoring, beyond. Multi-Object Tracking is one most important vision. Our paper tries to explore Unmanned Aerial vehicles (UAVs) Surface Vehicles (USVs). Most current algorithms...

10.1109/wacvw60836.2024.00130 article EN 2024-01-01

This report presents our team's 'PCIE_EgoHandPose' solution for the EgoExo4D Hand Pose Challenge at CVPR2024. The main goal of challenge is to accurately estimate hand poses, which involve 21 3D joints, using an RGB egocentric video image provided task. task particularly challenging due subtle movements and occlusions. To handle complexity task, we propose Vision Transformer (HP-ViT). HP-ViT comprises a ViT backbone transformer head joint positions in 3D, utilizing MPJPE RLE loss function....

10.48550/arxiv.2406.12219 preprint EN arXiv (Cornell University) 2024-06-17

This report presents our Le3DE2E solution for unified sensor-based detection, tracking, and forecasting in Argoverse Challenges at CVPR 2023 Workshop on Autonomous Driving (WAD). We propose a network that incorporates three tasks, including forecasting. adopts strong Bird's Eye View (BEV) encoder with spatial temporal fusion generates representations multi-tasks. The was tested the 2 sensor dataset to evaluate of 26 object categories. achieved 1st place Detection, Tracking, Forecasting E2E track WAD.

10.48550/arxiv.2311.15615 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Recently, lane detection has made great progress in autonomous driving. RESA (REcurrent Feature-Shift Aggregator) is based on image segmentation. It presents a novel module to enrich feature after preliminary extraction with an ordinary CNN. For Tusimple dataset, there not too complicated scene and more prominent spatial features. On the basis of RESA, we introduce method position embedding enhance The experimental results show that this achieved best accuracy 96.93% dataset.

10.48550/arxiv.2203.12301 preprint EN public-domain arXiv (Cornell University) 2022-01-01
Coming Soon ...