- Speech and Audio Processing
- Speech Recognition and Synthesis
- Music and Audio Processing
- Voice and Speech Disorders
- Blind Source Separation Techniques
- Advanced Image and Video Retrieval Techniques
- Face and Expression Recognition
- Advanced Adaptive Filtering Techniques
- Image Retrieval and Classification Techniques
- Video Analysis and Summarization
- Indoor and Outdoor Localization Technologies
- Speech and dialogue systems
- Phonetics and Phonology Research
- Face recognition and analysis
- Natural Language Processing Techniques
- Hand Gesture Recognition Systems
- Neural Networks and Applications
- Topic Modeling
- Advanced Data Compression Techniques
- Video Surveillance and Tracking Methods
- Image and Signal Denoising Methods
- Advanced Vision and Imaging
- Hearing Loss and Rehabilitation
- Image Processing Techniques and Applications
- Emotion and Mood Recognition
Kobe University
2016-2025
Kanazawa Medical Center
2017-2024
National Hospital Organization
2017-2024
Nagoya University
2024
Kumamoto Health Science University
2022
Nara Institute of Science and Technology
1996-2020
Multidisciplinary Digital Publishing Institute (Switzerland)
2020
Duke University
2020
The University of Tokyo
2009-2019
Hitotsubashi University
2019
This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier convert source speech target than in traditional cepstrum space. DBNs have deep architecture that automatically discovers abstractions maximally express original input features. If we train only an individual speaker, can be considered there less phonological information and relatively more speaker individuality output features at...
This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize target signal. The (dictionary) consist of exemplars, having same texts uttered by speakers. input is decomposed into noise obtained from signal, their weights (activities). Then, using converted constructed exemplars. We carried out speaker tasks clean data noise-added data. effectiveness this method was confirmed comparing...
Upregulation of N-type Ca 2+ channel dependent subunits increases functional connections and synchronization for pain formation.
We propose Gaussian Mixture Model (GMM)-based emotional voice conversion using spectrum and prosody features. In recent years, speech recognition synthesis techniques have been developed, an technique is required for synthesizing more expressive voices. The common was based on transformation of neutral to by huge corpus. this paper, we convert a GMMs. GMM-based widely used modify non linguistic information such as characteristics while keeping unchanged. Because the conventional method...
This paper presents a voice conversion (VC) method that utilizes the recently proposed probabilistic models called recurrent temporal restricted Boltzmann machines (RTRBMs). One RTRBM is used for each speaker, with goal of capturing high-order dependencies in an acoustic sequence. Our algorithm starts from separate training one source speaker and another target using speaker-dependent data. Because attempts to discover abstractions maximally express data at time step, as well data, we expect...
Objective The aim of this study was to profile and compare the middle ear microbiomes human subjects with without chronic otitis media. Study Design Prospective multicenter cohort study. Methods All consecutive patients undergoing tympanoplasty surgery for media or conditions other than were recruited. Sterile swab samples collected from mucosa during surgery. variable region 4 16S rRNA gene in each sample amplified using region‐specific primers adapted Illumina MiSeq sequencer (Illumina,...
In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. VC is technique where only speaker-specific information in source speech converted keeping phonological unchanged. Most of existing methods rely on data-pairs from and target speakers uttering same sentences. However, causes several problems: 1) used for are limited to predefined sentences, 2) trained model applied speaker pair training, 3) mismatches alignment may occur....
This paper presents a voice conversion (VC) method that utilizes recently proposed recurrent temporal restricted Boltzmann machines (RTRBMs) for each speaker, with the goal of capturing high-order dependencies in an acoustic sequence. Our algorithm starts from separate training two RTRBMs source and target speaker using speaker-dependent data. Since RTRBM attempts to discover abstractions at time step, as well data, we expect models represent speaker-specific latent features spaces. In our...
This study investigates the time–frequency dynamics of return and volatility spillovers between stock market three commodity markets: natural gas, crude oil, gold via a comparative analysis United States China is conducted with help new empirical methods. Our findings are as follows. First, in terms time, oil strongest two markets. Crude emits net negative spillover to US market, positive Chinese market. By contrast, effect transmitted markets both countries through gold. However, has on In...
This paper introduces a novel multimodal framework for economic time series forecasting, integrating textual information with historical price data to enhance predictive accuracy. The proposed method employs multi-head attention mechanism dynamically align embeddings temporal data, capturing previously unrecognized cross-modal dependencies and enhancing the model’s ability interpret event-driven market dynamics. enables model complex behaviors in unified effective manner. Experimental...
Previous studies have provided the biological basis for therapeutic use of enamel matrix derivative (EMD) at sites periodontal regeneration. A purpose this study is to determine effects EMD on cell growth, osteoblastic differentiation and insulin-like growth factor-I (IGF-I) transforming factor-beta 1 (TGF-beta 1) production in human ligament cells (HPLC). We also examined participation endogenous IGF-I TGF-beta with EMD-stimulated these cells. HPLCs used were treated alone or combination...
An artificial neural network is one of the most important models for training features in a voice conversion task. Typically, Neural Networks (NNs) are not effective processing low-dimensional F0 features, thus this causes that performance those methods based on networks Mel Cepstral Coefficients (MCC) outstanding. However, can robustly represent various prosody signals (e.g., emotional prosody). In study, we propose an method NNs to train normalized-segment-F0 (NSF0) conversion. Meanwhile,...
In this paper, a lip-reading method using novel dynamic feature of lip images is proposed. The calculated as the first-order regression coefficients few neighboring frames (images). It constiutes better representation time derivatives to basic static image. processed by convolution neural networks (CNNs), which are able reduce negative influence caused shaking subject and face alignment blurring at feature-extraction level. Its effectiveness has been confirmed word-recognition experiments...
We present in this paper an end-to-end automatic speech recognition (ASR) system for a person with articulation disorder resulting from athetoid cerebral palsy. In the case of type disorder, style is quite different that physically unimpaired person, and amount their data available to train model limited because burden large due strain on muscles. Therefore, performance ASR systems people degrades significantly. paper, we propose framework trained by not only Japanese but also non-Japanese...
This paper introduces a model adaptation approach for speaker-dependent dysarthric speech recognition system. The dysarthria we focus on in this is caused by athetoid cerebral palsy, which causes involuntary muscle movements those with the disease. For reason, people's often unstable and difficult conventional automatic (ASR) systems to recognize. A model-adaptation approach, adapts an ASR speech, one possible solution. However, because difference speaking styles between non-dysarthric...
Periodontal ligament cells may play an important role in the successful regeneration of periodontium. We investigated effects recombinant human bone morphogenetic protein-2 (rhBMP-2), one most potent growth factors that stimulates osteoblast differentiation and formation, on cell osteoblastic periodontal (HPLC) isolated from four adult patients. rhBMP-2 induced no significant changes any HPLCs. at concentrations over 50 ng/mL significantly stimulated alkaline phosphatase (ALPase) activity...
This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize target signal. The (dictionary) consist of exemplars, having same texts uttered by speakers. input is decomposed into noise their weights (activities). Then, using converted constructed from exemplars. We carried out speaker tasks clean data noise-added data. effectiveness this method was confirmed comparing its with that...
This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier convert the source speech target than in traditional cepstrum space. We deep architecture that concatenates two RBMs with neural networks, expecting they automatically discover abstractions express original input features. Under this concept, if we train only an individual speaker includes various phonemes...
We present in this paper an exemplar-based voice conversion (VC) method using a phoneme-categorized dictionary. Sparse representation-based VC Non-negative matrix factorization (NMF) is employed for spectral between different speakers. In our previous NMF-based method, source exemplars and target are extracted from parallel training data, having the same texts uttered by The input signal represented their weights. Then, converted speech constructed weights related to exemplars. However,...
Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based analysis human hearing judgments made by 10 speech therapists for classifying children disorders ( n = 30) and typical development 51). Using stimuli limited to single-word utterances, was superior therapist judgments. There significantly higher true-positive than false-negative rate but not therapists. Results are discussed terms some artificiality...
An artificial neural network is an important model for training features of voice conversion (VC) tasks. Typically, networks (NNs) are very effective in processing nonlinear features, such as Mel Cepstral Coefficients (MCC), which represent the spectrum features. However, a simple representation fundamental frequency (F0) not enough NNs to deal with emotional VC. This because time sequence F0 changes drastically. Therefore, our previous method, we used continuous wavelet transform (CWT)...