- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Image Enhancement Techniques
- Advanced Image Processing Techniques
- Speech Recognition and Synthesis
- Music and Audio Processing
- Speech and Audio Processing
- Advanced Vision and Imaging
- Rough Sets and Fuzzy Logic
- Graph Theory and Algorithms
- Advanced Image and Video Retrieval Techniques
- Hand Gesture Recognition Systems
- Human Motion and Animation
- Advanced Computational Techniques and Applications
- Generative Adversarial Networks and Image Synthesis
- Gait Recognition and Analysis
- Face and Expression Recognition
- Face recognition and analysis
- Semantic Web and Ontologies
- Advanced Graph Neural Networks
- Anomaly Detection Techniques and Applications
- Service-Oriented Architecture and Web Services
- Music Technology and Sound Studies
- Advanced Image Fusion Techniques
- Multi-Criteria Decision Making
Jiangxi Normal University
2006-2024
Guangzhou University
2016-2024
Sichuan University
2024
West China Second University Hospital of Sichuan University
2024
Shandong Provincial QianFoShan Hospital
2024
Shandong First Medical University
2024
Hubei University
2023
China Tobacco
2022
University of Electronic Science and Technology of China
2020-2021
Guangdong University of Technology
2020
Person Re-IDentification (ReID) aims at re-identifying persons from different viewpoints across multiple cameras. Capturing the fine-grained appearance differences is often key to accurate person ReID, because many identities can be differentiated only when looking into these differences. However, most state-of-the-art ReID approaches, typically driven by a triplet loss, fail effectively learn features as they are focused more on differentiating large To address this issue, we introduce...
Underwater object detection is crucial in marine exploration, presenting a challenging problem computer vision due to factors like light attenuation, scattering, and background interference. Existing underwater models face challenges such as low robustness, extensive computation of model parameters, high false rate. To address these challenges, this paper proposes lightweight method integrating deep learning image enhancement. Firstly, FUnIE-GAN employed perform data enhancement restore the...
Sensor-based human activity recognition (HAR) plays a fundamental role in various mobile application scenarios, but the model performance of HAR heavily relies on richness dataset and completeness data annotation. To address shortage comprehensive types collected datasets, we adopt domain adaptation technique with graph neural network-based approach by incorporating an adaptive learning mechanism to enhance action model’s generalization ability, especially when faced limited sample sizes....
It is necessary for the music-to-dance generation to consider both kinematics in dance that highly complex and non-linear connection between music movement far from deterministic. Existing approaches attempt address limited creativity problem, but it still a very challenging task. First, long-term sequence-to-sequence Second, noisy extracted motion keypoints. Last, there exist local global dependencies sequence sequence. To these issues, we propose novel autoregressive generative framework...
In this paper, we focus on automatically colorizing single grayscale image without manual interventions. Most of existing methods tried to accurately restore unknown ground-truth colors and require paired training data for model optimization. However, the ideal restoration objective strict constraints limited their performance. Inspired by CycleGAN, formulate process colorization as image-to-image translation propose an effective color-CycleGAN solution. High-level semantic identity loss...
Automatic image colorization without manual interventions is an ill-conditioned and inherently ambiguous problem. Most of existing methods focus on formulating as a regression problem learn parametric mappings from grayscale to color through deep neural networks. Due the multimodalities color-grayscale space, in many applications, it not required recover exact ground-truth color. Pair-wise pixel-to-pixel learning-based algorithms lack rationality. Techniques such space conversion techniques...
Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 synthesis Compared with previous GMM-ResNet model, has four improvements. Firstly, different order GMMs have capabilities to form smooth approximations feature distribution, multiple extract multi-scale Log Gaussian Probability features. Secondly, grouping technique is improve classification accuracy by exposing group cardinality while reducing both number of parameters...
With the development of deep learning, many different network architectures have been explored in speaker verification. However, most rely on a single learning architecture, and hybrid networks combining little studied ASV task. In this paper, we propose GMM-ResNext model for Conventional GMM does not consider score distribution each frame feature over all Gaussian components ignores relationship between neighboring speech frames. So, extract log probability features based raw acoustic use...
In this paper, we propose an iterative soft channel estimation and data detection algorithm based on a factor graph. Channel coefficients as well symbols are treated variable nodes all estimated in low-complexity element-wise manner. Applying asymmetric LDPC codes, is able to deliver ambiguity-free outputs for MIMO systems with or without training symbols. Training inherently utilized type of priori information. This thoroughly relaxes the troublesome constraints design sense that arbitrary...
According to the all set theory, a fuzzy-random crack structural model is presented. To deal with function, following steps are taken. Firstly, when geometry sizes considered in random numbers, stress intensity factor (K1) equation of mean value, variance and interval fracture function transformed. Secondly, length fuzzy variable, K1 structure Finally, analysis given, example shows application effective structures.
Visual tracking integrates the technology of image processing and pattern recognition, etc., which has a lot potential applications, such as automatic driving, safety monitoring, etc. This paper analyzes advantages disadvantages Kernelized Correlation Filter (KCF) Tracking-Learning-Detection (TLD), are two kinds trackers. TLD tracker correcting capability whereas its performance highly depends on tracker, is not robust to some cases, non-grid objects. Inversely, KCF achieves good in However,...
The automatic speaker verification system is sometimes vulnerable to various spoofing attacks. 2-class Gaussian Mixture Model classifier for genuine and spoofed speech usually used as the baseline detection. However, GMM does not separately consider scores of feature frames on each component. In addition, accumulates all independently, their correlations. We propose two-path GMM-ResNet GMM-SENet models detection, whose input probability features based two GMMs trained respectively. only...
Cross-modal image-text retrieval is a fundamental task in information retrieval. The key to this address both heterogeneity and cross-modal semantic correlation between data of different modalities. Fine-grained matching methods can nicely model local correlations image text but face two challenges. First, images may contain redundant while sentences often words without meaning. Such redundancy interferes with the textual regions. Furthermore, shall consider not only low-level correspondence...
In hyperspectral image classification, both spectral and spatial data distributions are important in describing identifying different materials objects the image. Furthermore, consistent structures across bands can be useful capturing inherent structural information of objects. These imply that three properties should considered when reconstructing an using sparse coding methods. First, distribution ground leads to coefficients locations. Second, local change slightly due reflectance various...
Much of action recognition research is recently based on a bag words (BOW) representation by quantizing the extracted 3D interest points from videos. The k-means algorithm commonly used to construct visual vocabulary. However, it has two major drawbacks. Firstly, vocabulary sensitive size and initialization. Secondly, unable capture salient properties videos this may contain large amount information redundancy. In paper, we propose novel approach which constructs represents video sparse...
The simulation of battery systems needs to consider structural heterogeneity in electrodes as they are highly irregular shape and size while containing pores or even cracks at different length scales, which may result non-uniform transport kinetics throughout the electrodes. Developing such model with detailed 3D microstructure can be computationally expensive for direct simulations. Here, we propose reduce computational cost by developing a more realistic via variational multiscale method...
Abstract Dehazing is a challenging ill‐posed image restoration task. Various prior‐based and learning‐based methods have been proposed. Among them, end‐to‐end deep models achieve great success on performance improvement. However, most of them are concentrated feature learning within the same block scale in isolation, cannot perform associated analysis well characteristics different scales. Inter‐scale information reuse which especially beneficial to often neglected. Therefore, this paper,...
In the modern era of big data, large-scale graph computing has become challenging because dramatic rise in data size. Graph edge partitioning (GEP) is a crucial preprocessing step to distributed platforms, yet it partition graphs. GEP shown better quality than vertex for graph's skewed degree distribution. Existing approaches are classified into two as stream and offline. The former category assigns edges partitions based on previously received information. It less affected by order compared...
The study explored a deep learning image super-resolution approach which is commonly used in face recognition, video perception and other fields. These generative adversarial networks usually have high-frequency texture details. relevant textures of high-resolution images could be transferred as reference to low-resolution images. latest existing methods use transformer ideas transfer related images, but there are still some problems with channel detailed textures. Therefore, the proposed an...
Introduction Various approaches are employed to expedite the passage of meconium in preterm infants within neonatal intensive care unit (NICU), with glycerine enemas being most frequently used. Due potential risk high osmolality-induced harm intestinal mucosa, diluted enema solutions commonly used clinical practice. The challenge lies current lack knowledge regarding safest and effective concentration enema. This research aims ascertain safety different concentrations solution infants....