- Speech and Audio Processing
- Music and Audio Processing
- Speech Recognition and Synthesis
- Neural Networks and Applications
- Diverse Musicological Studies
- Advanced Memory and Neural Computing
- Indoor and Outdoor Localization Technologies
- Music Technology and Sound Studies
- Topic Modeling
- Brain Tumor Detection and Classification
- Neural Networks and Reservoir Computing
- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- Advanced Adaptive Filtering Techniques
- Image Processing Techniques and Applications
- Advanced Neural Network Applications
- Advanced Text Analysis Techniques
- Video Analysis and Summarization
- Voice and Speech Disorders
- Natural Language Processing Techniques
Kobe University
2021-2024
Emotional Voice Conversion (EVC) technology aims to transfer emotional state in speech while keeping the linguistic information and speaker identity unchanged. Prior studies on EVC have been limited perform conversion for a specific or predefined set of multiple speakers seen training stage. When encountering arbitrary that may be unseen during (outside used training), existing methods capabilities. However, converting emotion speakers, even those procedure, one model is much more...
This paper introduces a zero-shot sound event classification (ZS-SEC) method to identify events that have never occurred in training data. In our previous work, we proposed ZS-SEC using attribute vectors (SAVs), where deep neural network model infers information describes the of an class instead inferring its label directly. Our showed it could classify unseen some extent; however, accuracy for was far inferior seen events. this paper, propose new can learn discriminative global features and...
Abstract In financial markets, the sentiment expressed in news articles plays a pivotal role interpreting and forecasting market trends, which also holds true for task of summarization (FNS). Leveraging AI models to analyze social science data, this paper employs improve FNS effectiveness by introducing novel method that combines polarity extracted from with prompt augmentation techniques ensure generated summaries are emotionally consistent source articles. Specifically, detected sentiments...
Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition.Convolutional Neural Networks (CNNs) have demonstrated excellent performance vision tasks thanks to their powerful feature-extraction capability.However, as the larger models shown higher accuracy, recent developments led state-of-the-art CNN with increasing resource consumption.This paper investigates a conceptual approach reduce memory...
Abstract Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition. Convolutional Neural Networks (CNNs) have demonstrated excellent performance vision tasks thanks to their powerful feature extraction capability. However, as the larger models shown higher accuracy, recent developments led state-of-the-art CNN with increasing resource consumption. This paper investigates a conceptual approach reduce...
In this paper, we introduce a zero-shot learning method for sound event classification. The proposed uses semantic embedding of each class and measures the compatibility between input audio feature embedding. For embedding, newly define attribute vector that explains several information class, such as pitch, length, material source, etc. experiments, showed higher accuracy than conventional using word
Arbitrary voice conversion, also referred to as zero-shot has recently attracted increased attention in the literature.Although disentangling linguistic and style representations for acoustic features is an effective way achieve problem of how convert a natural speaker challenging because intrinsic variabilities speech difficulties completely decoupling them.For this reason, paper, we propose Two-Pathway Style Embedding Voice Conversion framework (TPSE-VC) realistic conversion.The novel...
This paper introduces a zero-shot sound event classification (ZS-SEC) method to identify events that have never occurred in training data. In our previous work, we proposed ZS-SEC using attribute vectors (SAVs), where deep neural network model infers information describes the of an class instead inferring its label directly. Our showed it could classify unseen some extent; however, accuracy for was far inferior seen events. this paper, propose new can learn discriminative global features and...
Any-to-any voice conversion can be performed among arbitrary speakers, even with a single reference utterance. Many related studies have demonstrated that it effectively implemented by speech representation disentanglement. However, most existing solutions fuse the speaker representations into content features globally without considering their distribution difference. Additionally, in any-to-any scenario, there is no effective method ensuring consistency of linguistic text transcription or...