- Multimodal Machine Learning Applications
- Anomaly Detection Techniques and Applications
- Advanced Image Processing Techniques
- Advanced Image Fusion Techniques
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Gaze Tracking and Assistive Technology
- Image Enhancement Techniques
- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Video Surveillance and Tracking Methods
- Visual Attention and Saliency Detection
- Image and Signal Denoising Methods
- Target Tracking and Data Fusion in Sensor Networks
- Advanced Computing and Algorithms
- Topic Modeling
- scientometrics and bibliometrics research
- Remote-Sensing Image Classification
- Advanced Vision and Imaging
- Infrared Target Detection Methodologies
- Hand Gesture Recognition Systems
- Image Retrieval and Classification Techniques
- Gaussian Processes and Bayesian Inference
- COVID-19 diagnosis using AI
- Tactile and Sensory Interactions
Central South University
2023-2025
Northwestern Polytechnical University
2020-2025
Beijing Institute of Technology
2023-2024
Xi'an University of Architecture and Technology
2022-2024
Wuhan Polytechnic University
2024
Third Xiangya Hospital
2024
Ocean University of China
2023
China Railway Group (China)
2022
Peking Union Medical College Hospital
2021
Chinese Academy of Medical Sciences & Peking Union Medical College
2021
The combination of infrared and visible videos aims to gather more comprehensive feature information from multiple sources reach superior results on various practical tasks, such as detection segmentation, over that a single modality. However, most existing dual-modality object algorithms ignore the modal differences fail consider correlation between extraction fusion, which leads incomplete inadequate fusion features. Hence, there raises an issue how preserve each unique fully utilize...
Object detection for remote sensing is a fundamental task in image processing of sensing; as one the core components, small or tiny object plays an important role. Despite considerable advancements achieved with integration CNN and transformer networks, there remains untapped potential enhancing extraction utilization information associated objects. Particularly within structures, this arises from disregard complex intertwined interplay between spatial context channel during global modeling...
This article presents U 2PNet, a novel unsupervised underwater image restoration network using polarization for improving signal-to-noise ratio and quality in imaging environments. Traditional methods require specific cues or pairs of datasets, which limit their practical applications. Our proposed method requires only one mosaicked polarized the scene does not datasets pretraining cues. We design two subnetworks (T-net B textsubscript ∞ -net) to accurately estimate transmission map...
Super-resolution neural networks have recently achieved great progress in restoring high-quality remote sensing images at low zoom-in magnitude. However, these often struggle with challenges like shape distortion and blurring effects due to the severe absence of structure texture details large-factor image super-resolution. Addressing challenges, we propose a novel Two-Stage Spatial-Frequency Joint Learning Network (TSFNet). TSFNet innovatively merges insights from both spatial frequency...
We propose a novel graph Laplacian-guided coupled tensor decomposition (gLGCTD) model for fusion of hyperspectral image (HSI) and multispectral (MSI) spatial spectral resolution enhancements. The Tucker is employed to capture the global interdependencies across different modes fully exploit intrinsic spatial-spectral information. To preserve local characteristics, complementary submanifold structures embedded in high-resolution (HR)-HSI are encoded by Laplacian regularizations. information...
This paper presents a mosaic convolution-attention network (MCAN) for demosaicing spectral images captured using multispectral filter array (MSFA) imaging sensors. MSFA-based systems acquire information of scene in single snap-shot operation. A complete image is reconstructed by an image. To avoid aliasing and artifacts demosaicing, we utilize joint spatial-spectral correlation raw The proposed MCAN includes convolution module (MCM) attention (MAM). MCM extracts features via learning...
Weakly supervised object detection (WSOD) has recently attracted much attention in the field of remote sensing, where only image-level labels that distinguish existence an images are required. However, existing methods frequently treat most discriminative area as optimal solution and, meanwhile, ignore fact more than one instance may exist a certain class sensing (RSIs). To address issue, we propose unique multiple graph (MIG) learning framework for WSOD RSIs. The motivation this work is...
Gaze object prediction is a newly proposed task that aims to discover the objects being stared at by humans. It of great application significance but still lacks unified solution framework. An intuitive incorporate an detection branch into existing gaze method. However, previous methods usually use two different networks extract features from scene image and head image, which would lead heavy network architecture prevent each joint optimization. In this paper, we build novel framework named...
Expert feedback lays the foundation of rigorous research. However, rapid growth scholarly production and intricate knowledge specialization challenge conventional scientific mechanisms. High-quality peer reviews are increasingly difficult to obtain. Researchers who more junior or from under-resourced settings have especially hard times getting timely feedback. With breakthrough large language models (LLM) such as GPT-4, there is growing interest in using LLMs generate on research...
The temporal action localization research aims to discover instances from untrimmed videos, representing a fundamental step in the field of intelligent video understanding. With advent deep learning, backbone networks have been instrumental providing representative spatiotemporal features, while end-to-end learning paradigm has enabled development high-quality models through data-driven training. Both supervised and weakly approaches contributed rapid progress localization, resulting...
This paper presents Core, a conceptually simple, effective and communication-efficient model for multi-agent cooperative perception. It addresses the task from novel perspective of reconstruction, based on two key insights: 1) cooperating agents together provide more holistic observation environment, 2) can serve as valuable supervision to explicitly guide learning how reconstruct ideal collaboration. Core instantiates idea with three major components: compressor each agent create compact...
Recently, remote sensing image super-resolution (RSISR) has drawn considerable attention and made great breakthroughs based on convolutional neural networks (CNNs). Due to the scale richness of texture structural information frequently recurring inside same images (RSIs) but varying greatly with different RSIs, state-of-the-art CNN-based methods have begun explore multiscale global features in RSIs by using mechanisms. However, they are still insufficient significant content clues RSIs. In...
The task of long-term action anticipation demands solutions that can effectively model temporal dynamics over extended periods while deeply understanding the inherent semantics actions. Traditional approaches, which primarily rely on recurrent units or Transformer layers to capture dependencies, often fall short in addressing these challenges. Large Language Models (LLMs), with their robust sequential modeling capabilities and extensive commonsense knowledge, present new opportunities for...
Modeling cross-video relationship is an important issue for the weakly supervised temporal action localization task. To this end, traditional methods operate at level and rely on complicated strategies to prepare triplet samples, which only mines relationships among three videos from two categories. In work, we observe that instances different categories could exhibit similar motion patterns, i.e. subaction, propose sub-action granularity elaborately explore relationships. However, given...
Temporal action localization aims at discovering instances in untrimmed videos, where RGB and flow are two widely used feature modalities. Specifically, chiefly reveals appearance mainly depicts motion. Given features, previous methods employ the early fusion or late paradigm to mine complementarity between them. By concatenating raw implicitly achieved by network, but it partly discards particularity of each modality. The independently maintains branches explore modality, only fuses...
Intent perception is a novel task that aims to understand the intention of images, regular classification methods usually perform unsatisfactorily on intent due semantic ambiguity problem,i.e. intra-class variety problem in which images same class may contain objects different categories and inter-class confusion classes similar categories. To address this problem, paper introduces prototype learning into proposes unified framework named PIP-Net reduce influence ambiguity. Specifically, for...
Denoising and demosaicking long-wave infrared (LWIR) division-of-focal-plane (DoFP) polarization images are crucial for various vision applications. However, existing methods rely on the sequential application of individual denoising processes, which may result in accumulation errors produced by each process. To address this issue, we propose a joint method LWIR DoFP based three-stage progressive deep convolutional neural network. ensure generalization ability network, it is essential to...
Many multi-view camera-based 3D object detection models transform the image features into Bird's-Eye-View (BEV) via Lift-Splat-Shoot (LSS) mechanism, which "lifts" 2D camera-view to voxel space based on predicted depth distribution and then "splats" a BEV plane for subsequent detection. However, feature in such one-stage view transformation scheme heavily relies quality of features, further determines final performance. In this paper, we propose BEVRefiner model performs dual refinement both...