- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Anomaly Detection Techniques and Applications
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Evacuation and Crowd Dynamics
- Domain Adaptation and Few-Shot Learning
- Traffic Prediction and Management Techniques
- Image Enhancement Techniques
- Data Visualization and Analytics
- Multimodal Machine Learning Applications
- Industrial Vision Systems and Defect Detection
- Visual Attention and Saliency Detection
- Advanced Image Processing Techniques
- Advanced Vision and Imaging
- Traffic control and management
- Autonomous Vehicle Technology and Safety
- Advanced Image Fusion Techniques
- Face and Expression Recognition
- Video Analysis and Summarization
- Image and Signal Denoising Methods
- Gait Recognition and Analysis
- 3D Shape Modeling and Analysis
- Remote-Sensing Image Classification
- Robotics and Sensor-Based Localization
Zhengzhou University
2016-2025
University of Chinese Academy of Sciences
2025
Shanghai Institute of Optics and Fine Mechanics
2024
Chinese Academy of Sciences
2024
University of Science and Technology of China
2021-2024
Zhengzhou Business University
2024
Ministry of Education of the People's Republic of China
2022-2024
Yangzhou University
2024
Zhejiang University
2010-2024
Jiangsu Normal University
2024
Convolutional Neural Network (CNN) based methods generally take crowd counting as a regression task by outputting densities. They learn the mapping between image contents and density distributions. Though having achieved promising results, these data-driven networks are prone to overestimate or underestimate people counts of regions with different patterns, which degrades whole count accuracy. To overcome this problem, we propose an approach alleviate performance differences in regions....
Abstract Interfacial electron transfer between cocatalyst and photosensitizer is key in heterogeneous photocatalysis, yet the underlying mechanism remains subtle unclear. Surfactant coated on metal cocatalysts, greatly modulating microenvironment of catalytic sites, largely ignored. Herein, a series Pt co‐catalysts with modulated microenvironments, including polyvinylpyrrolidone (PVP) capped nanoparticles (denoted as PVP ), partially removed (Pt rPVP clean without (Pt), were encapsulated...
Single Image Deraining (SID) is a relatively new and still challenging topic in emerging vision applications, most of the recently emerged deraining methods use supervised manner depending on ground-truth (i.e., using paired data). However, practice it rather common to encounter unpaired images real task. In such cases, how remove rain streaks an unsupervised way will be task due lack constraints between hence suffering from low-quality restoration results. this paper, we therefore explore...
Existing deep learning based matting algorithms primarily resort to high-level semantic features improve the overall structure of alpha mattes. However, we argue that advanced semantics extracted from CNNs contribute unequally for perception and are supposed reconcile information with low-level appearance cues refine foreground details. In this paper, propose an end-to-end Hierarchical Attention Matting Network (HAttMatting), which can predict better mattes single RGB images without...
Low-light image enhancement (LLIE) explores how to refine the illumination and obtain natural normal-light images. Current LLIE methods mainly focus on improving illumination, but do not consider color consistency by reasonably incorporating information into process. As a result, difference usually exists between enhanced ground-truth. To address this issue, we propose new deep consistent network termed DCC-Net retain for LLIE. A "divide conquer" collaborative strategy is presented, which...
This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. We first introduce a boosting-based approach learn correspondence structure which indicates patch-wise matching probabilities between images from target camera pair. The learned can not only capture pattern cameras but also handle viewpoint variation individual images. further global-based process. It integrates global constraint over exclude...
Matching images and sentences demands a fine understanding of both modalities. In this paper, we propose new system to discriminatively embed the image text shared visual-textual space. field, most existing works apply ranking loss pull positive / pairs close push negative apart from each other. However, directly deploying is hard for network learning, since it starts two heterogeneous features build inter-modal relationship. To address problem, instance which explicitly considers...
Numerous research efforts have been conducted to simulate the crowd movements, while relatively few of them are specifically focused on multihazard situations. In this paper, we propose a novel simulation method by modeling generation and contagion panic emotion under circumstances. order depict effect from hazards other agents movement, first classify into different types (transient persistent, concurrent nonconcurrent, static dynamic) based their inherent characteristics. Second, introduce...
With the popularity of multimedia technology, information is always represented or transmitted from multiple views. Most existing algorithms are graph-based ones to learn complex structures within multiview data but overlooked representations. Furthermore, many works treat views discriminatively by introducing some hyperparameters, which undesirable in practice. To this end, abundant based methods have been proposed for dimension reduction. However, there still no research leverage work into...
Person Re-Identification (ReID) has achieved remarkable performance along with the deep learning era. However, most approaches carry out ReID only based upon holistic pedestrian regions. In contrast, real-world scenarios involve occluded pedestrians, which provide partial visual appearances and destroy accuracy. A common strategy is to locate visible body parts by auxiliary model, however suffers from significant domain gaps data bias issues. To avoid such problematic models in person ReID,...
Hexagonal boron nitride (h-BN) catalyst has recently been reported to be highly selective in oxidative dehydrogenation of propane (ODHP) for olefin production. In addition propene, ethylene also forms with much higher overall selectivities C2-products than C1-products. this work, we report that the reaction pathways over h-BN are different from V-based catalysts ODHP. Oxidative coupling methyl, an intermediate cleavage C─C bond propane, contributes high C2-products, leading more C1-products...
The crowd stampede and terrorist attacks in public areas have now become more serious dangerous threats due to the rapid increase population scale of cities. Therefore, analysis aggregation behavior has been a new research focus field intelligent video surveillance. However, such area scenes not only contain moving but also other types objects. sizes these objects are usually small, which make their appearances quite similar. Moreover, individuals move randomly often occlude each other. All...
This paper addresses the problem of recognizing and removing shadows from monochromatic natural images a learning-based perspective. Without chromatic information, shadow recognition removal are extremely challenging in this paper, mainly due to missing invariant color cues. Natural scenes make even harder complex illumination condition ambiguity many near-black objects. In scheme is proposed tackle challenges above-mentioned. First, we propose use both shadow-variant cues illumination,...
Most person re-identification (re-ID) approaches are based on supervised learning, which requires manually annotated data. However, it is not only resource-intensive to acquire identity annotation but also impractical for large-scale To relieve this problem, we propose a cross-camera unsupervised approach that makes use of style-transferred images jointly optimize convolutional neural network (CNN) and the relationship among individual samples re-ID. Our algorithm considers two fundamental...
Due to the superior ability of global dependency modeling, Transformer and its variants have become primary choice many vision-and-language tasks. However, in tasks like Visual Question Answering (VQA) Referring Expression Comprehension (REC), multimodal prediction often requires visual information from macro- micro-views. Therefore, how dynamically schedule local modeling has an emerging issue. In this paper, we propose example-dependent routing scheme called TRAnsformer Routing (TRAR)...
This article proposes a fast and accurate network for surface defect detection, termed SDDNet. SDDNet mainly addresses two challenging issues-large texture variation small size of defects-by introducing modules: feature retaining block (FRB) skip densely connected module (SDCM). FRB fuses multiple pyramidal maps with different resolutions is plugged on the top pooling layers, aiming to preserve information, which may be lost because downsampling. SDCM designed propagate fine-grained details...
Neighboring frames are more correlated compared to from further temporal distances. In this paper, we aim explore the correlations among neighboring and exploit cross-layer multi-scale features for action recognition. First, present a Temporal Cross-Layer Correlation (TCLC) framework correlation learning. The unified uncovers both local global structures video data, enabling better exploration of context assisting spatio-temporal feature Second, propose novel attention center-guided...
Due to the high maneuverability and flexibility, unmanned aerial vehicles (UAVs) have been considered as a promising paradigm assist mobile edge computing (MEC) in many scenarios including disaster rescue field operation. Most existing research focuses on study of trajectory computation-offloading scheduling for UAV-assisted MEC stationary environments, could face challenges dynamic environments where locations UAVs devices (MDs) vary significantly. Some latest attempts develop policies by...
In this article, we propose a transformer-based RGB-D egocentric action recognition framework, called Trear. It consists of two modules: 1) interframe attention encoder and 2) mutual-attentional fusion block. Instead using optical flow or recurrent units, adopt self-attention mechanism to model the temporal structure data from different modalities. Input frames are cropped randomly mitigate effect redundancy. Features each modality interacted through proposed block combined simple yet...
Text-video retrieval is one of the basic tasks for multimodal research and has been widely harnessed in many real-world systems. Most existing approaches directly compare global representation between videos text descriptions utilize contrastive loss to train model. These designs overlook local alignment word-level supervision signal. In this paper, we propose a new framework, called Align Tell, text-video retrieval. Compared previous work, our framework contains additional modules, <italic...