Chunlei Zhang

ORCID: 0000-0002-6253-2446
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Remote-Sensing Image Classification
  • Music and Audio Processing
  • Remote Sensing and Land Use
  • Advanced Image Fusion Techniques
  • Hydrocarbon exploration and reservoir analysis
  • Image Retrieval and Classification Techniques
  • Seismic Imaging and Inversion Techniques
  • Drilling and Well Engineering
  • Natural Language Processing Techniques
  • Geological and Geophysical Studies
  • Advanced Image and Video Retrieval Techniques
  • Face and Expression Recognition
  • Sparse and Compressive Sensing Techniques
  • Speech and dialogue systems
  • Image and Signal Denoising Methods
  • Quantum Computing Algorithms and Architecture
  • Constructed Wetlands for Wastewater Treatment
  • Gut microbiota and health
  • Control and Dynamics of Mobile Robots
  • Environmental Chemistry and Analysis
  • Melamine detection and toxicity
  • Marine and Coastal Research
  • Nanocomposite Films for Food Packaging

Jiangsu Normal University
2017-2024

North China University of Science and Technology
2016-2024

Tencent (China)
2024

Xinjiang Petroleum Society
2024

Zhejiang University
2023

Bellevue Hospital Center
2019-2022

Beijing Haidian Hospital
2022

AviChina Industry & Technology (China)
2021

The University of Texas at Dallas
2017

Yangzhou University
2017

10.1016/j.engappai.2023.106234 article EN Engineering Applications of Artificial Intelligence 2023-04-10

Traditional studies on voice conversion (VC) have made progress with parallel training data and known speakers. Good quality is obtained by exploring better alignment modules or expressive mapping functions. In this study, we investigate zero-shot VC from a novel perspective of self-supervised disentangled speech representation learning. Specifically, achieve the disentanglement balancing information flow between global speaker time-varying content in sequential variational autoencoder...

10.1109/icassp43922.2022.9747272 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

The vision transformer (ViT) has become a hot topic in image processing due to its global feature extraction capabilities. However, the ViT suffers from over-smoothing and over-fitting training procedure, so it is hard achieve satisfactory performance hyperspectral (HSI) classification. To address these issues, we propose with contrastive learning (CViT). network architecture includes patch embedding module, blocks, classifier. of CViT can be considered as an optimization problem supervised...

10.1109/lgrs.2023.3255867 article EN IEEE Geoscience and Remote Sensing Letters 2023-01-01

Deep learning has made significant progress in hyperspectral image (HSI) classification, and its powerful ability to automatically learn abstract features is well recognized. Recently, the simple architecture of multi-layer perceptron (MLP) been extensively employed extract long-range dependencies HSI achieved impressive results. However, existing MLP-based models exhibit insufficient representation spectral–spatial information generally aggregate with fixed weights, which limits their...

10.1016/j.jag.2024.103754 article EN cc-by-nc-nd International Journal of Applied Earth Observation and Geoinformation 2024-03-11

Abstract Pixel‐wise classification of hyperspectral image (HSI) is a hot spot in the field remote sensing. The HSI requires model to be more sensitive dense features, which quite different from modelling requirements traditional tasks. Cycle‐Multilayer Perceptron (MLP) has achieved satisfactory results feature prediction tasks because it an expert extracting high‐resolution features. In order obtain stable receptive and enhance effect extraction multiple directions, we propose MLP‐like...

10.1049/cvi2.12104 article EN cc-by-nc IET Computer Vision 2022-04-12

Despite the rapid progress in automatic speech recognition (ASR) research, recognizing multilingual using a unified ASR system remains highly challenging. Previous works on mainly focus two directions: multiple monolingual or code-switched that uses different languages interchangeably within single utterance. However, pragmatic recognizer is expected to be compatible with both directions. In this work, novel language-aware encoder (LAE) architecture proposed handle situations by...

10.21437/interspeech.2022-923 article EN Interspeech 2022 2022-09-16

Expressive speech introduces variations in the acoustic features affecting performance of technology such as speaker verification systems. It is important to identify range emotions for which we can reliably estimate tasks. This paper studies a system function emotions. Instead categorical classes happiness or anger, have intra-class variability, use continuous attributes arousal, valence, and dominance facilitate analysis. We evaluate an trained with i-vector framework probabilistic linear...

10.1109/icassp.2017.7953216 article EN 2017-03-01

Disentangling content and speaking style information is essential for zero-shot non-parallel voice conversion (VC). Our previous study investigated a novel framework with disentangled sequential variational autoencoder (DSVAE) as the backbone decomposition. We have demonstrated that simultaneous disentangling embedding speaker from one utterance feasible VC. In this study, we continue direction by raising concern about prior distribution of branch in DSVAE baseline. find random initialized...

10.21437/interspeech.2022-11225 article EN Interspeech 2022 2022-09-16

We study traveling pulses on a lattice and in continuum where all pairs ofparticles interact, contributing to the potential energy. The interaction may be positiveor negative, depending particular pair but overall is positive certain sense.For such an kernel $J$ with unit integral (or sum), operator 1/ε2[J∗u-u], ∗ continuous or discrete convolution,shares some common features spatial second derivative operator, especially when εis small. Therefore, equation $u_{t t}$ - 1/ε2[J∗u-u] + f(u)=0...

10.3934/dcds.2006.16.235 article EN cc-by Discrete and Continuous Dynamical Systems 2006-01-01

Deep learning has dominated hyperspectral image (HSI) classification due to its modular design and powerful feature extraction capabilities. Recently, a modern macro-architecture-based framework with high-order interactions been proposed, inspiring the of HSI models. As spatial mixer in macro-architecture, interaction facilitates aggregation discriminative information by gated mechanisms standard convolutions. However, homogeneous operators convolution are challenging consider different...

10.1016/j.jag.2023.103482 article EN cc-by-nc-nd International Journal of Applied Earth Observation and Geoinformation 2023-09-01

This document briefly describes the systems submitted by Center for Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to 2016 National Institute Standards and Technology (NIST) Speaker Recognition Evaluation (SRE). We developed several UBM DNN i-Vector based speaker recognition with different data sets feature representations. Given that emphasis NIST SRE is on language mismatch between training enrollment/test data, so-called domain mismatch, in our system...

10.21437/interspeech.2017-555 preprint EN Interspeech 2022 2017-08-16

Deep learning methods have shown great promise in automatically extracting features from hyperspectral images (HSIs) for classification purposes. Recently, researchers recognized the importance of high-order feature interactions—capturing relationships between different image regions—in discriminative features. Despite their effectiveness, existing deep models HSI often overlook interactions, resulting suboptimal performance. To address this issue, we propose a novel spectral–spatial...

10.1109/jstars.2023.3298477 article EN cc-by-nc-nd IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2023-01-01

A novel Spectral-Spatial Difference Convolution Network (S <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> DCN) is proposed for hyperspectral image (HSI) classification, which integrates the difference principle into deep learning framework. S DCN employs a learnable gradient encoding pattern to extract important detail features in spectral and spatial domains, alleviating information loss caused by over-smoothing effect feature...

10.1109/jstars.2023.3349175 article EN cc-by-nc-nd IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2024-01-01

The architectures based on Multi-Layer Perceptron (MLP) have attracted great attention in hyperspectral image (HSI) classification recently, due to their simplified and efficient architectures. However, such are qualified by the rigid positional relationships between weights feature elements, inhibiting capacity effectively extract diversified features. To address these challenges, An adaptive spatial-shift MLP (AS2MLP) is presented dynamically modify spatial features parameterizing...

10.1080/01431161.2024.2311790 article EN International Journal of Remote Sensing 2024-02-13

This paper describes an end-to-end adversarial singing voice conversion (EA-SVC) approach. It can directly generate arbitrary waveform by given phonetic posteriorgram (PPG) representing content, F0 pitch, and speaker embedding timbre, respectively. Proposed system is composed of three modules: generator $G$, the audio generation discriminator $D_{A}$, feature disentanglement $D_F$. The $G$ encodes features in parallel inversely transforms them into target waveform. In order to make timbre...

10.48550/arxiv.2012.01837 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Speaker diarization consists of many components, e.g., front-end processing, speech activity detection (SAD), overlapped (OSD) and speaker segmentation/clustering. Conventionally, most the involved components are separately developed optimized. The resulting systems complicated sometimes lack satisfying generalization capabilities. In this study, we present a novel system, with generalized neural clustering module as backbone. whole system can be simplified to contain only two major parts,...

10.1109/icassp43922.2022.9747301 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common. In addition, majority models currently rely on annotated audio data, but it is crucial to scale them self-supervised datasets order effectively capture wide range acoustic variations present human voice, including speaker identity, emotion, and prosody. this work, we propose Make-A-Voice, a unified framework for synthesizing manipulating signals from...

10.48550/arxiv.2305.19269 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...