- Advanced Image and Video Retrieval Techniques
- Video Surveillance and Tracking Methods
- Advanced Neural Network Applications
- Image Enhancement Techniques
- Domain Adaptation and Few-Shot Learning
- Face and Expression Recognition
- Multimodal Machine Learning Applications
- Advanced Image Processing Techniques
- Image and Signal Denoising Methods
- Human Pose and Action Recognition
- Face recognition and analysis
- Robotics and Sensor-Based Localization
- Image Retrieval and Classification Techniques
- Gaze Tracking and Assistive Technology
- Advanced Vision and Imaging
- Neural Networks and Applications
- Handwritten Text Recognition Techniques
- Gait Recognition and Analysis
- Color Science and Applications
- Radiomics and Machine Learning in Medical Imaging
- Generative Adversarial Networks and Image Synthesis
- Biometric Identification and Security
- Visual Attention and Saliency Detection
- Image Processing Techniques and Applications
- Remote-Sensing Image Classification
Wuhan University of Science and Technology
2020-2025
China Ocean Shipping (China)
2025
University of Chicago
2020-2024
Tsinghua University
2017-2024
China Geological Survey
2024
China Electronics Technology Group Corporation
2024
Hefei University of Technology
2022-2023
University of Electronic Science and Technology of China
2019-2022
Waseda University
2017-2021
Tencent (China)
2019-2021
Video streaming is crucial for AI applications that gather videos from sources to servers inference by deep neural nets (DNNs). Unlike traditional video optimizes visual quality, this new type of permits aggressive compression/pruning pixels not relevant achieving high DNN accuracy. However, much potential left unrealized, because current protocols are driven the source (camera) where compute rather limited. We advocate protocol should be real-time feedback server-side DNN. Our insight...
We develop an approach to learning visual representations that embraces multimodal data, driven by a combination of intra- and inter-modal similarity preservation objectives. Unlike existing pre-training methods, which solve proxy prediction task in single domain, our method exploits intrinsic data properties within each modality semantic information from cross-modal correlation simultaneously, hence improving the quality learned representations. By including training unified framework with...
Cross-modality face recognition is an emerging topic due to the wide-spread usage of different sensors in day-to-day life applications. The development systems relies greatly on existing databases for evaluation and obtaining training examples data-hungry machine learning algorithms. However, currently, there no publicly available database that includes more than two modalities same subject. In this work, we introduce Tufts Face Database images acquired various modalities: photograph images,...
Current visible-infrared cross-modality person re-identification research has only focused on exploring the bi-modality mutual retrieval paradigm, and we propose a new more practical mix-modality paradigm. Existing V isible- I nfrared (VI-ReID) methods have achieved some results in paradigm by learning correspondence between visible infrared modalities. However, significant performance degradation occurs due to modality confusion problem when these are applied Therefore, this paper proposes...
This paper presents a discrepancy minimizing model to address the discrete optimization problem in hashing learning. The introduced by binary constraint is an NP-hard mixed integer programming problem. It usually addressed relaxing variables into continuous adapt gradient based learning of functions, especially training deep neural networks. To deal with objective caused relaxation, we transform original differentiable over hash functions through series expansion. transformation decouples...
Designing an effective loss function plays important role in visual analysis. Most existing designs rely on hand-crafted heuristics that require domain experts to explore the large design space, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Loss Function Search (AM-LFS) leverages REINFORCE search functions during training process. The key contribution of work space can guarantee generalization transferability different vision tasks by including a bunch...
In this paper, we address the challenging unconstrained set-based face recognition problem where each subject is instantiated by a set of media (images and videos) instead single image. Naively aggregating information from all within would suffer large intra-set variance caused heterogeneous factors (e.g., varying modalities, poses illumination) fail to learn discriminative representations. A novel Multi-Prototype Network (MP- Net) model thus proposed multiple prototype representations...
Person re-identification (Re-ID) aims to retrieve all images of the specific person captured by non-overlapping cameras and scenarios. Regardless significant success achieved daytime Re-ID methods, they will perform poorly due degraded imaging quality under low-light conditions. Therefore, some works attempt synthesize explore challenges in nighttime, which omits fact that synthetic may not realistically reflect at night. Moreover, other follow "enhancement-then-match" manner, but it is...
Diffusion models (DMs) have recently been introduced in image deblurring and exhibited promising performance, particularly terms of details reconstruction. However, the diffusion model requires a large number inference iterations to recover clean from pure Gaussian noise, which consumes massive computational resources. Moreover, distribution synthesized by is often misaligned with target results, leading restrictions distortion-based metrics. To address above issues, we propose Hierarchical...
We propose an efficient diffusion-based text-to-video super-resolution (SR) tuning approach that leverages the readily learned capacity of pixel level image diffusion model to capture spatial information for video generation. To accomplish this goal, we design architecture by inflating weightings text-to-image SR into our generation framework. Additionally, incorporate a temporal adapter ensure coherence across frames. investigate different approaches based on inflated and report trade-offs...
Person re-identification (re-ID) is commonly investigated as a ranking problem. However, the performance of existing re-ID models drops dramatically, when they encounter extreme positive-negative class imbalance (e.g., very small ratio positive and negative samples) during training. To alleviate this problem, article designs rank-in-rank loss to optimize distribution feature embeddings. Specifically, we propose Differentiable Retrieval-Sort Loss (DRSL) model by each sample ahead samples...
The goal of re-identification (re-ID) is to find an object (e.g., person or vehicle) interest across cameras. In re-ID, designing suitable and effective loss functions plays essential imperative role in learning identifiable features. Regardless the significant success achieved by using retrieval- verification-based due re-ID can be formulated as a retrieval verification task, model performance might degraded owing inconsistency between evaluation metrics. Moreover, current hand-designed...
We develop an approach to growing deep network architectures over the course of training, driven by a principled combination accuracy and sparsity objectives. Unlike existing pruning or architecture search techniques that operate on full-sized models supernet architectures, our method can start from small, simple seed dynamically grow prune both layers filters. By combining continuous relaxation discrete structure optimization with scheme for sampling sparse subnetworks, we produce compact,...
The <i>Journal of Biomedical Optics</i> (JBO) is a Gold Open Access journal that publishes peer-reviewed papers on the use novel optical systems and techniques for improved health care biomedical research.
This paper studied the spatial distribution and influencing factors of heavy metals (HMs) such as Cu, Pb, Zn, Cr, Ni, Cd As in soil Linzhou County Lhasa River basin. By collecting 504 surface samples, using descriptive statistics, Kriging interpolation geoaccumulation index methods, combined with geographic detector model, characteristics HMs content its interaction 19 environmental were systematically analyzed.The results showed that this area was generally higher than background value...
Designing an effective loss function plays important role in visual analysis. Most existing designs rely on hand-crafted heuristics that require domain experts to explore the large design space, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Loss Function Search (AM-LFS) leverages REINFORCE search functions during training process. The key contribution of work space can guarantee generalization transferability different vision tasks by including a bunch...
In this paper, we propose an Enhanced Bayesian Compression method to flexibly compress the deep networks via reinforcement learning. Unlike existing compression which cannot explicitly enforce quantization weights during training, our learns flexible codebooks in each layer for optimal network quantization. To dynamically adjust state of codebooks, employ Actor-Critic collaborate with original network. Different from most methods, EBC does not require re-training procedures after...
Image super-resolution (SR) methods typically model degradation to improve reconstruction accuracy in complex and unknown scenarios. However, extracting information from low-resolution images is challenging, which limits the performance. To boost image SR performance, one feasible approach introduce additional priors. Inspired by advancements multi-modal text prompt processing, we prompts provide Specifically, first design a text-image generation pipeline integrate into dataset through...
Previous studies recognize pain expressions based on the entire face, for example, Prkachin and Solomon Pain intensity (PSPI). However, patients face is often masked by instruments in an intensive care unit (ICU), such as respirator, gauzes, just name a few, which causes agent cannot measure using PSPI directly. To tackle this problem, we explore recognition from face. First, conducted four levels of measurement experiments with types Swin-Transformer. Experiment results show that accuracy...
Abstract In existing remote sensing image retrieval (RSIR) datasets, the number of images among different classes varies dramatically, which leads to a severe class imbalance problem. Some studies propose train model with ranking‐based metric (e.g., average precision [AP]), because AP is robust imbalance. However, current AP‐based methods overlook an important issue: only optimising samples ranking before each positive sample, limited by definition and prone local optimum. To achieve global...