NFDI4DS | UHH-SEMS - Publication Details

Speaker-Independent Emotional Voice Conversion via Disentangled Representations

OPENALEX - Publications

Xunquan Chen Xuexin Xu Jinhui Chen Zhizhong Zhang Tetsuya Takiguchi and 1 more

Emotional Voice Conversion (EVC) technology aims to transfer emotional state in speech while keeping the linguistic information and speaker identity unchanged. Prior studies on EVC have been limited perform conversion for a specific or predefined set of multiple speakers seen training stage. When encountering arbitrary that may be unseen during (outside used training), existing methods capabilities. However, converting emotion speakers, even those procedure, one model is much more...

10.1109/tmm.2022.3222646 article EN IEEE Transactions on Multimedia 2022-11-16

Zero-Shot Sound Event Classification Using a Sound Attribute Vector with Global and Local Feature Learning

OPENALEX - Publications

Yi-Han Lin Xunquan Chen Ryoichi Takashima Tetsuya Takiguchi

This paper introduces a zero-shot sound event classification (ZS-SEC) method to identify events that have never occurred in training data. In our previous work, we proposed ZS-SEC using attribute vectors (SAVs), where deep neural network model infers information describes the of an class instead inferring its label directly. Our showed it could classify unseen some extent; however, accuracy for was far inferior seen events. this paper, propose new can learn discriminative global features and...

10.1109/icassp49357.2023.10096367 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Prefix tuning with prompt augmentation for efficient financial news summarization

OPENALEX - Publications

Shangyang Mou Qiang Xue Xunquan Chen Jinhui Chen Ryoichi Takashima and 2 more

Abstract In financial markets, the sentiment expressed in news articles plays a pivotal role interpreting and forecasting market trends, which also holds true for task of summarization (FNS). Leveraging AI models to analyze social science data, this paper employs improve FNS effectiveness by introducing novel method that combines polarity extracted from with prompt augmentation techniques ensure generated summaries are emotionally consistent source articles. Specifically, detected sentiments...

10.1007/s42001-024-00352-w article EN cc-by Journal of Computational Social Science 2024-12-26

Convolutional Neural Networks Inference Memory Optimization with Receptive Field-Based Input Tiling

OPENALEX - Publications

Weihao Zhuang Tristan Hascoet Xunquan Chen Ryoichi Takashima Tetsuya Takiguchi and 1 more

Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition.Convolutional Neural Networks (CNNs) have demonstrated excellent performance vision tasks thanks to their powerful feature-extraction capability.However, as the larger models shown higher accuracy, recent developments led state-of-the-art CNN with increasing resource consumption.This paper investigates a conceptual approach reduce memory...

10.1561/116.00000015 article EN cc-by-nc APSIPA Transactions on Signal and Information Processing 2023-01-01

Convolutional Neural Networks Inference Memory Optimization with Receptive Field-Based InputTiling

OPENALEX - Publications

Weihao Zhuang Tristan Hascoet Xunquan Chen Ryoichi Takashima Tetsuya Takiguchi and 1 more

Abstract Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition. Convolutional Neural Networks (CNNs) have demonstrated excellent performance vision tasks thanks to their powerful feature extraction capability. However, as the larger models shown higher accuracy, recent developments led state-of-the-art CNN with increasing resource consumption. This paper investigates a conceptual approach reduce...

10.21203/rs.3.rs-743636/v1 preprint EN Research Square (Research Square) 2021-08-10

Phoneme-guided Dysarthric speech conversion With non-parallel data by joint training

OPENALEX - Publications

Xunquan Chen Atsuki Oshiro Jinhui Chen Ryoichi Takashima Tetsuya Takiguchi

10.1007/s11760-021-02119-6 article EN Signal Image and Video Processing 2022-01-30

Binary Attribute Embeddings for Zero-Shot Sound Event Classification

OPENALEX - Publications

Yi-Han Lin Xunquan Chen Ryoichi Takashima Tetsuya Takiguchi

In this paper, we introduce a zero-shot learning method for sound event classification. The proposed uses semantic embedding of each class and measures the compatibility between input audio feature embedding. For embedding, newly define attribute vector that explains several information class, such as pitch, length, material source, etc. experiments, showed higher accuracy than conventional using word

10.1109/gcce56475.2022.10014127 article EN 2022 IEEE 11th Global Conference on Consumer Electronics (GCCE) 2022-10-18

Two-Pathway Style Embedding for Arbitrary Voice Conversion

OPENALEX - Publications

Xuexin Xu Liang Shi Jinhui Chen Xunquan Chen Jie Lian and 3 more

Arbitrary voice conversion, also referred to as zero-shot has recently attracted increased attention in the literature.Although disentangling linguistic and style representations for acoustic features is an effective way achieve problem of how convert a natural speaker challenging because intrinsic variabilities speech difficulties completely decoupling them.For this reason, paper, we propose Two-Pathway Style Embedding Voice Conversion framework (TPSE-VC) realistic conversion.The novel...

10.21437/interspeech.2021-506 article EN Interspeech 2022 2021-08-27

Zero-shot Sound Event Classification Using a Sound Attribute Vector with Global and Local Feature Learning

OPENALEX - Publications

Yihan Lin Xunquan Chen Ryoichi Takashima Tetsuya Takiguchi

This paper introduces a zero-shot sound event classification (ZS-SEC) method to identify events that have never occurred in training data. In our previous work, we proposed ZS-SEC using attribute vectors (SAVs), where deep neural network model infers information describes the of an class instead inferring its label directly. Our showed it could classify unseen some extent; however, accuracy for was far inferior seen events. this paper, propose new can learn discriminative global features and...

10.48550/arxiv.2303.10316 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Any-to-Any Voice Conversion With Multi-Layer Speaker Adaptation and Content Supervision

OPENALEX - Publications

Xuexin Xu Liang Shi Xunquan Chen Pingyuan Lin Jie Lian and 3 more

Any-to-any voice conversion can be performed among arbitrary speakers, even with a single reference utterance. Many related studies have demonstrated that it effectively implemented by speech representation disentanglement. However, most existing solutions fuse the speaker representations into content features globally without considering their distribution difference. Additionally, in any-to-any scenario, there is no effective method ensuring consistency of linguistic text transcription or...

10.1109/taslp.2023.3306716 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2023-01-01

Optical Flow Regularization of Implicit Neural Representations for Video Frame Interpolation

OPENALEX - Publications

W. Zhuang Tristan Hascoet Xunquan Chen Ryoichi Takashima Tetsuya Takiguchi

10.1561/116.00000218 article EN cc-by-nc APSIPA Transactions on Signal and Information Processing 2023-01-01

Direction of arrival estimation for indoor environments based on acoustic composition model with a single microphone

OPENALEX - Publications

Xingchen Guo Xuexin Xu Xunquan Chen Jinhui Chen Rong Jia and 3 more

10.1016/j.patcog.2022.108715 article EN publisher-specific-oa Pattern Recognition 2022-04-18