Xunquan Chen

ORCID: 0000-0002-5336-6386
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Music and Audio Processing
  • Speech Recognition and Synthesis
  • Neural Networks and Applications
  • Diverse Musicological Studies
  • Advanced Memory and Neural Computing
  • Indoor and Outdoor Localization Technologies
  • Music Technology and Sound Studies
  • Topic Modeling
  • Brain Tumor Detection and Classification
  • Neural Networks and Reservoir Computing
  • Advanced Vision and Imaging
  • Advanced Image Processing Techniques
  • Advanced Adaptive Filtering Techniques
  • Image Processing Techniques and Applications
  • Advanced Neural Network Applications
  • Advanced Text Analysis Techniques
  • Video Analysis and Summarization
  • Voice and Speech Disorders
  • Natural Language Processing Techniques

Kobe University
2021-2024

Emotional Voice Conversion (EVC) technology aims to transfer emotional state in speech while keeping the linguistic information and speaker identity unchanged. Prior studies on EVC have been limited perform conversion for a specific or predefined set of multiple speakers seen training stage. When encountering arbitrary that may be unseen during (outside used training), existing methods capabilities. However, converting emotion speakers, even those procedure, one model is much more...

10.1109/tmm.2022.3222646 article EN IEEE Transactions on Multimedia 2022-11-16

This paper introduces a zero-shot sound event classification (ZS-SEC) method to identify events that have never occurred in training data. In our previous work, we proposed ZS-SEC using attribute vectors (SAVs), where deep neural network model infers information describes the of an class instead inferring its label directly. Our showed it could classify unseen some extent; however, accuracy for was far inferior seen events. this paper, propose new can learn discriminative global features and...

10.1109/icassp49357.2023.10096367 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Abstract In financial markets, the sentiment expressed in news articles plays a pivotal role interpreting and forecasting market trends, which also holds true for task of summarization (FNS). Leveraging AI models to analyze social science data, this paper employs improve FNS effectiveness by introducing novel method that combines polarity extracted from with prompt augmentation techniques ensure generated summaries are emotionally consistent source articles. Specifically, detected sentiments...

10.1007/s42001-024-00352-w article EN cc-by Journal of Computational Social Science 2024-12-26

Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition.Convolutional Neural Networks (CNNs) have demonstrated excellent performance vision tasks thanks to their powerful feature-extraction capability.However, as the larger models shown higher accuracy, recent developments led state-of-the-art CNN with increasing resource consumption.This paper investigates a conceptual approach reduce memory...

10.1561/116.00000015 article EN cc-by-nc APSIPA Transactions on Signal and Information Processing 2023-01-01

Abstract Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition. Convolutional Neural Networks (CNNs) have demonstrated excellent performance vision tasks thanks to their powerful feature extraction capability. However, as the larger models shown higher accuracy, recent developments led state-of-the-art CNN with increasing resource consumption. This paper investigates a conceptual approach reduce...

10.21203/rs.3.rs-743636/v1 preprint EN Research Square (Research Square) 2021-08-10

In this paper, we introduce a zero-shot learning method for sound event classification. The proposed uses semantic embedding of each class and measures the compatibility between input audio feature embedding. For embedding, newly define attribute vector that explains several information class, such as pitch, length, material source, etc. experiments, showed higher accuracy than conventional using word

10.1109/gcce56475.2022.10014127 article EN 2022 IEEE 11th Global Conference on Consumer Electronics (GCCE) 2022-10-18

Arbitrary voice conversion, also referred to as zero-shot has recently attracted increased attention in the literature.Although disentangling linguistic and style representations for acoustic features is an effective way achieve problem of how convert a natural speaker challenging because intrinsic variabilities speech difficulties completely decoupling them.For this reason, paper, we propose Two-Pathway Style Embedding Voice Conversion framework (TPSE-VC) realistic conversion.The novel...

10.21437/interspeech.2021-506 article EN Interspeech 2022 2021-08-27

This paper introduces a zero-shot sound event classification (ZS-SEC) method to identify events that have never occurred in training data. In our previous work, we proposed ZS-SEC using attribute vectors (SAVs), where deep neural network model infers information describes the of an class instead inferring its label directly. Our showed it could classify unseen some extent; however, accuracy for was far inferior seen events. this paper, propose new can learn discriminative global features and...

10.48550/arxiv.2303.10316 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Any-to-any voice conversion can be performed among arbitrary speakers, even with a single reference utterance. Many related studies have demonstrated that it effectively implemented by speech representation disentanglement. However, most existing solutions fuse the speaker representations into content features globally without considering their distribution difference. Additionally, in any-to-any scenario, there is no effective method ensuring consistency of linguistic text transcription or...

10.1109/taslp.2023.3306716 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2023-01-01
Coming Soon ...