- Human Pose and Action Recognition
- Video Surveillance and Tracking Methods
- Multimodal Machine Learning Applications
- Hand Gesture Recognition Systems
- Generative Adversarial Networks and Image Synthesis
- Image Enhancement Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Vision and Imaging
- Anomaly Detection Techniques and Applications
- Human Motion and Animation
- Video Analysis and Summarization
- Advanced Image Processing Techniques
- Advanced Image and Video Retrieval Techniques
- Gait Recognition and Analysis
- Image and Video Quality Assessment
- Computer Graphics and Visualization Techniques
- Advanced Image Fusion Techniques
- Robotics and Sensor-Based Localization
- 3D Shape Modeling and Analysis
- Advanced Neural Network Applications
- Robotic Path Planning Algorithms
- Image Retrieval and Classification Techniques
- Remote Sensing and LiDAR Applications
- Diabetic Foot Ulcer Assessment and Management
- Infrared Target Detection Methodologies
University of Washington
2023-2025
Seattle University
2024-2025
University of Illinois Urbana-Champaign
2022-2023
Zhejiang University
2022-2023
Zhejiang University-University of Edinburgh Institute
2022
Diffusion-based methods can generate realistic images and videos, but they struggle to edit existing objects in a video while preserving their appearance over time. This prevents diffusion models from being applied natural editing practical scenarios. In this paper, we tackle problem by introducing temporal dependency text-driven models, which allows them consistent for the edited objects. Specifically, develop novel inter-frame propagation mechanism editing, leverages concept of layered...
Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, HPE wild is still biggest challenge for learning-based models, whether 2D-3D lifting, image-to-3D, or diffusion-based methods, since trained networks implicitly learn camera intrinsic parameters and domain-based distributions estimate poses by statistical average. On other hand, results case-by-case,...
A lightweight underwater image enhancement network is of great significance for resource-constrained platforms, but balancing model size, computational efficiency, and performance has proven difficult previous approaches. In this work, we propose the Five A$^{+}$ Network (FA$^{+}$Net), a highly efficient real-time with only $\sim$ 9k parameters 0.01s processing time. The FA$^{+}$Net employs two-stage structure. strong prior stage aims to decompose challenging degradations into sub-problems,...
Image-based fashion design with AI techniques has attracted increasing attention in recent years. We focus on a new task, where we aim to transfer reference appearance image onto clothing while preserving the structure of image. It is challenging task since there are no images available for newly designed output images. Although diffusion-based translation or neural style (NST) enabled flexible transfer, it often difficult maintain original realistically during reverse diffusion, especially...
Image-based fashion design with AI techniques has attracted increasing attention in recent years. We focus on the reference-based task, where we aim to combine a reference appearance image and clothing generate new image. Although existing diffusion-based translation methods have enabled flexible style transfer, it is often difficult transfer of realistically during reverse diffusion. When referenced domain greatly differs from source domain, leads collapse translation. To tackle this issue,...
Animal visual perception is an important technique for automatically monitoring animal health, understanding behaviors, and assisting animal-related research. However, it challenging to design a deep learning-based model that can freely adapt different animals across various tasks, due the varying poses of large diversity animals, lacking data on rare species, semantic inconsistency tasks. We introduce UniAP, novel Universal Perception leverages few-shot learning enable cross-species among...
The current 3D human pose estimators face challenges in adapting to new datasets due the scarcity of 2D-3D pairs target domain training sets. We present Multi-Hypothesis Pose Synthesis Domain Adaptation (PoSynDA) framework overcome this issue without extensive annotation. Utilizing a diffusion-centric structure, PoSynDA simulates distribution domain, filling data diversity gap. By incorporating multi-hypothesis network, it creates diverse hypotheses and aligns them with domain....
When applying a pre-trained 2D-to-3D human pose lifting model to target unseen dataset, large performance degradation is commonly encountered due domain shift issues. We observe that the caused by two factors: 1) distribution gap over global positions of poses between source and datasets variant camera parameters settings, 2) deficient diversity local structures in training. To this end, we combine adaptation generalization PoseDA, simple yet effective framework unsupervised for 3D...
Recent advances in computer vision algorithms have transformed the bridge visual inspection process. Those typically require large amounts of annotated data, which is lacking for generic scenarios. To address this challenge efficiently, research designs, develops, and demonstrates a platform that can provide synthetic datasets testing environments, termed Random Bridge Generator (RBG). The RBG produces photo-realistic 3D environments six types bridges randomly, automatically, procedurally....
Human motion generation has advanced markedly with the advent of diffusion models. Most recent studies have concentrated on generating sequences based text prompts, commonly referred to as text-to-motion generation. However, bidirectional and text, enabling tasks such motion-to-text alongside text-to-motion, been largely unexplored. This capability is essential for aligning diverse modalities supports unconditional In this paper, we introduce PackDiT, first diffusion-based generative model...
Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension decision. This paper presents a comprehensive survey of deep in focusing on three main aspects: algorithms, datasets virtual environments, challenges. Firstly, we discuss hierarchical structure algorithms performance which includes perception, decision while comparing their strengths weaknesses. Secondly, list widely used existing highlight characteristics...
Real-world image dehazing remains a challenging task due to the diverse nature of haze degradation and lack large-scale paired datasets. Existing methods based on hand-crafted priors or generative struggle recover accurate backgrounds fine details from dense regions. In this work, we propose novel paradigm, PromptHaze, for real-world via depth prompt Depth Anything model. By employing prompt-by-prompt strategy, our method iteratively updates progressively restores background through network...
Existing low-light image enhancement (LIE) methods have achieved noteworthy success in solving synthetic distortions, yet they often fall short practical applications. The limitations arise from two inherent challenges real-world LIE: 1) the collection of distorted/clean pairs is impractical and sometimes even unavailable, 2) accurately modeling complex degradations presents a non-trivial problem. To overcome them, we propose Attribute Guidance Diffusion framework (AGLLDiff), training-free...
Deep vision multimodal learning aims at combining deep visual representation with other modalities, such as text, sound, and data collected from sensors. With the fast development of learning, has gained much interest community. This paper reviews types architectures used in including feature extraction, modality aggregation, loss functions. Then, we discuss several paradigms supervised, semi-supervised, self-supervised, transfer learning. We also introduce practical challenges missing...
In the field of multi-object tracking (MOT), traditional methods often rely on Kalman Filter for motion prediction, leveraging its strengths in linear scenarios. However, inherent limitations these become evident when confronted with complex, nonlinear motions and occlusions prevalent dynamic environments like sports dance. This paper explores possibilities replacing various learning-based model that effectively enhances accuracy adaptability beyond constraints Filter-based systems. this...
Deep learning-based (DL) visual recognition algorithms are widely investigated to enhance the accuracy, efficiency, and objectivity of bridge inspection process, which is largely manual today. These typically require a large amount training data, consists images corresponding annotations. The preparation such data sets time-consuming, more automated generation approaches that aided by synthetic environments suffer from domain gaps, result in poor performance real-world tasks. This study...
Although 3D human pose estimation has gained impres-sive development in recent years, only a few works focus on infants, that have different bone lengths and also limited data. Directly applying adult mod-els typically achieves low performance the infant domain suffers from out-of-distribution issues. Moreover, limitation of data collection heavily con-strains efficiency learning-based models to lift 2D poses 3D. To deal with issues small datasets, do-main adaptation augmentation are...
Recently, integrating video foundation models and large language to build a understanding system can overcome the limitations of specific pre-defined vision tasks. Yet, existing systems only handle videos with very few frames. For long videos, computation complexity, memory cost, long-term temporal connection impose additional challenges. Taking advantage Atkinson-Shiffrin model, tokens in Transformers being employed as carriers combination our specially designed mechanism, we propose...
Diffusion-based methods can generate realistic images and videos, but they struggle to edit existing objects in a video while preserving their appearance over time. This prevents diffusion models from being applied natural editing practical scenarios. In this paper, we tackle problem by introducing temporal dependency text-driven models, which allows them consistent for the edited objects. Specifically, develop novel inter-frame propagation mechanism editing, leverages concept of layered...
Video restoration networks aim to restore high-quality frame sequences from degraded ones. However, traditional video methods heavily rely on temporal modeling operators or optical flow estimation, which limits their versatility. The of this work is present a novel approach for that eliminates inefficient and pixel-level feature alignment in the network architecture. proposed method, Sequential Affinity Learning Network (SALN), designed based an affinity mechanism establishes direct...