Mingliang Xu

ORCID: 0000-0002-6885-3451
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Anomaly Detection Techniques and Applications
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Evacuation and Crowd Dynamics
  • Domain Adaptation and Few-Shot Learning
  • Traffic Prediction and Management Techniques
  • Image Enhancement Techniques
  • Data Visualization and Analytics
  • Multimodal Machine Learning Applications
  • Industrial Vision Systems and Defect Detection
  • Visual Attention and Saliency Detection
  • Advanced Image Processing Techniques
  • Advanced Vision and Imaging
  • Traffic control and management
  • Autonomous Vehicle Technology and Safety
  • Advanced Image Fusion Techniques
  • Face and Expression Recognition
  • Video Analysis and Summarization
  • Image and Signal Denoising Methods
  • Gait Recognition and Analysis
  • 3D Shape Modeling and Analysis
  • Remote-Sensing Image Classification
  • Robotics and Sensor-Based Localization

Zhengzhou University
2016-2025

University of Chinese Academy of Sciences
2025

Shanghai Institute of Optics and Fine Mechanics
2024

Chinese Academy of Sciences
2024

University of Science and Technology of China
2021-2024

Zhengzhou Business University
2024

Ministry of Education of the People's Republic of China
2022-2024

Yangzhou University
2024

Zhejiang University
2010-2024

Jiangsu Normal University
2024

Convolutional Neural Network (CNN) based methods generally take crowd counting as a regression task by outputting densities. They learn the mapping between image contents and density distributions. Though having achieved promising results, these data-driven networks are prone to overestimate or underestimate people counts of regions with different patterns, which degrades whole count accuracy. To overcome this problem, we propose an approach alleviate performance differences in regions....

10.1109/cvpr42600.2020.00476 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Abstract Interfacial electron transfer between cocatalyst and photosensitizer is key in heterogeneous photocatalysis, yet the underlying mechanism remains subtle unclear. Surfactant coated on metal cocatalysts, greatly modulating microenvironment of catalytic sites, largely ignored. Herein, a series Pt co‐catalysts with modulated microenvironments, including polyvinylpyrrolidone (PVP) capped nanoparticles (denoted as PVP ), partially removed (Pt rPVP clean without (Pt), were encapsulated...

10.1002/anie.202104219 article EN Angewandte Chemie International Edition 2021-05-14

Single Image Deraining (SID) is a relatively new and still challenging topic in emerging vision applications, most of the recently emerged deraining methods use supervised manner depending on ground-truth (i.e., using paired data). However, practice it rather common to encounter unpaired images real task. In such cases, how remove rain streaks an unsupervised way will be task due lack constraints between hence suffering from low-quality restoration results. this paper, we therefore explore...

10.1109/tip.2021.3074804 article EN IEEE Transactions on Image Processing 2021-01-01

Existing deep learning based matting algorithms primarily resort to high-level semantic features improve the overall structure of alpha mattes. However, we argue that advanced semantics extracted from CNNs contribute unequally for perception and are supposed reconcile information with low-level appearance cues refine foreground details. In this paper, propose an end-to-end Hierarchical Attention Matting Network (HAttMatting), which can predict better mattes single RGB images without...

10.1109/cvpr42600.2020.01369 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Low-light image enhancement (LLIE) explores how to refine the illumination and obtain natural normal-light images. Current LLIE methods mainly focus on improving illumination, but do not consider color consistency by reasonably incorporating information into process. As a result, difference usually exists between enhanced ground-truth. To address this issue, we propose new deep consistent network termed DCC-Net retain for LLIE. A "divide conquer" collaborative strategy is presented, which...

10.1109/cvpr52688.2022.00194 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. We first introduce a boosting-based approach learn correspondence structure which indicates patch-wise matching probabilities between images from target camera pair. The learned can not only capture pattern cameras but also handle viewpoint variation individual images. further global-based process. It integrates global constraint over exclude...

10.1109/iccv.2015.366 preprint EN 2015-12-01

Matching images and sentences demands a fine understanding of both modalities. In this paper, we propose new system to discriminatively embed the image text shared visual-textual space. field, most existing works apply ranking loss pull positive / pairs close push negative apart from each other. However, directly deploying is hard for network learning, since it starts two heterogeneous features build inter-modal relationship. To address problem, instance which explicitly considers...

10.1145/3383184 article EN ACM Transactions on Multimedia Computing Communications and Applications 2020-05-22

Numerous research efforts have been conducted to simulate the crowd movements, while relatively few of them are specifically focused on multihazard situations. In this paper, we propose a novel simulation method by modeling generation and contagion panic emotion under circumstances. order depict effect from hazards other agents movement, first classify into different types (transient persistent, concurrent nonconcurrent, static dynamic) based their inherent characteristics. Second, introduce...

10.1109/tsmc.2019.2899047 article EN IEEE Transactions on Systems Man and Cybernetics Systems 2019-01-01

With the popularity of multimedia technology, information is always represented or transmitted from multiple views. Most existing algorithms are graph-based ones to learn complex structures within multiview data but overlooked representations. Furthermore, many works treat views discriminatively by introducing some hyperparameters, which undesirable in practice. To this end, abundant based methods have been proposed for dimension reduction. However, there still no research leverage work into...

10.1109/tmm.2020.3032023 article EN IEEE Transactions on Multimedia 2020-10-21

Person Re-Identification (ReID) has achieved remarkable performance along with the deep learning era. However, most approaches carry out ReID only based upon holistic pedestrian regions. In contrast, real-world scenarios involve occluded pedestrians, which provide partial visual appearances and destroy accuracy. A common strategy is to locate visible body parts by auxiliary model, however suffers from significant domain gaps data bias issues. To avoid such problematic models in person ReID,...

10.1109/iccv48922.2021.01162 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Hexagonal boron nitride (h-BN) catalyst has recently been reported to be highly selective in oxidative dehydrogenation of propane (ODHP) for olefin production. In addition propene, ethylene also forms with much higher overall selectivities C2-products than C1-products. this work, we report that the reaction pathways over h-BN are different from V-based catalysts ODHP. Oxidative coupling methyl, an intermediate cleavage C─C bond propane, contributes high C2-products, leading more C1-products...

10.1126/sciadv.aav8063 article EN cc-by-nc Science Advances 2019-03-01

The crowd stampede and terrorist attacks in public areas have now become more serious dangerous threats due to the rapid increase population scale of cities. Therefore, analysis aggregation behavior has been a new research focus field intelligent video surveillance. However, such area scenes not only contain moving but also other types objects. sizes these objects are usually small, which make their appearances quite similar. Moreover, individuals move randomly often occlude each other. All...

10.1109/tcsvt.2017.2731866 article EN IEEE Transactions on Circuits and Systems for Video Technology 2017-07-25

This paper addresses the problem of recognizing and removing shadows from monochromatic natural images a learning-based perspective. Without chromatic information, shadow recognition removal are extremely challenging in this paper, mainly due to missing invariant color cues. Natural scenes make even harder complex illumination condition ambiguity many near-black objects. In scheme is proposed tackle challenges above-mentioned. First, we propose use both shadow-variant cues illumination,...

10.1109/tip.2017.2737321 article EN IEEE Transactions on Image Processing 2017-08-07

Most person re-identification (re-ID) approaches are based on supervised learning, which requires manually annotated data. However, it is not only resource-intensive to acquire identity annotation but also impractical for large-scale To relieve this problem, we propose a cross-camera unsupervised approach that makes use of style-transferred images jointly optimize convolutional neural network (CNN) and the relationship among individual samples re-ID. Our algorithm considers two fundamental...

10.1109/tip.2020.2982826 article EN IEEE Transactions on Image Processing 2020-01-01

Due to the superior ability of global dependency modeling, Transformer and its variants have become primary choice many vision-and-language tasks. However, in tasks like Visual Question Answering (VQA) Referring Expression Comprehension (REC), multimodal prediction often requires visual information from macro- micro-views. Therefore, how dynamically schedule local modeling has an emerging issue. In this paper, we propose example-dependent routing scheme called TRAnsformer Routing (TRAR)...

10.1109/iccv48922.2021.00208 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

This article proposes a fast and accurate network for surface defect detection, termed SDDNet. SDDNet mainly addresses two challenging issues-large texture variation small size of defects-by introducing modules: feature retaining block (FRB) skip densely connected module (SDCM). FRB fuses multiple pyramidal maps with different resolutions is plugged on the top pooling layers, aiming to preserve information, which may be lost because downsampling. SDCM designed propagate fine-grained details...

10.1109/tim.2021.3056744 article EN IEEE Transactions on Instrumentation and Measurement 2021-01-01

Neighboring frames are more correlated compared to from further temporal distances. In this paper, we aim explore the correlations among neighboring and exploit cross-layer multi-scale features for action recognition. First, present a Temporal Cross-Layer Correlation (TCLC) framework correlation learning. The unified uncovers both local global structures video data, enabling better exploration of context assisting spatio-temporal feature Second, propose novel attention center-guided...

10.1109/tmm.2021.3057503 article EN IEEE Transactions on Multimedia 2021-02-08

Due to the high maneuverability and flexibility, unmanned aerial vehicles (UAVs) have been considered as a promising paradigm assist mobile edge computing (MEC) in many scenarios including disaster rescue field operation. Most existing research focuses on study of trajectory computation-offloading scheduling for UAV-assisted MEC stationary environments, could face challenges dynamic environments where locations UAVs devices (MDs) vary significantly. Some latest attempts develop policies by...

10.1109/jiot.2021.3071531 article EN IEEE Internet of Things Journal 2021-04-07

In this article, we propose a transformer-based RGB-D egocentric action recognition framework, called Trear. It consists of two modules: 1) interframe attention encoder and 2) mutual-attentional fusion block. Instead using optical flow or recurrent units, adopt self-attention mechanism to model the temporal structure data from different modalities. Input frames are cropped randomly mitigate effect redundancy. Features each modality interacted through proposed block combined simple yet...

10.1109/tcds.2020.3048883 article EN IEEE Transactions on Cognitive and Developmental Systems 2021-01-01

Text-video retrieval is one of the basic tasks for multimodal research and has been widely harnessed in many real-world systems. Most existing approaches directly compare global representation between videos text descriptions utilize contrastive loss to train model. These designs overlook local alignment word-level supervision signal. In this paper, we propose a new framework, called Align Tell, text-video retrieval. Compared previous work, our framework contains additional modules, <italic...

10.1109/tmm.2022.3204444 article EN IEEE Transactions on Multimedia 2022-09-05
Coming Soon ...