Tetsuya Takiguchi

ORCID: 0000-0001-5005-7679
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Music and Audio Processing
  • Voice and Speech Disorders
  • Blind Source Separation Techniques
  • Advanced Image and Video Retrieval Techniques
  • Face and Expression Recognition
  • Advanced Adaptive Filtering Techniques
  • Image Retrieval and Classification Techniques
  • Video Analysis and Summarization
  • Indoor and Outdoor Localization Technologies
  • Speech and dialogue systems
  • Phonetics and Phonology Research
  • Face recognition and analysis
  • Natural Language Processing Techniques
  • Hand Gesture Recognition Systems
  • Neural Networks and Applications
  • Topic Modeling
  • Advanced Data Compression Techniques
  • Video Surveillance and Tracking Methods
  • Image and Signal Denoising Methods
  • Advanced Vision and Imaging
  • Hearing Loss and Rehabilitation
  • Image Processing Techniques and Applications
  • Emotion and Mood Recognition

Kobe University
2016-2025

Kanazawa Medical Center
2017-2024

National Hospital Organization
2017-2024

Nagoya University
2024

Kumamoto Health Science University
2022

Nara Institute of Science and Technology
1996-2020

Multidisciplinary Digital Publishing Institute (Switzerland)
2020

Duke University
2020

The University of Tokyo
2009-2019

Hitotsubashi University
2019

This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier convert source speech target than in traditional cepstrum space. DBNs have deep architecture that automatically discovers abstractions maximally express original input features. If we train only an individual speaker, can be considered there less phonological information and relatively more speaker individuality output features at...

10.21437/interspeech.2013-102 article EN Interspeech 2022 2013-08-25

This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize target signal. The (dictionary) consist of exemplars, having same texts uttered by speakers. input is decomposed into noise obtained from signal, their weights (activities). Then, using converted constructed exemplars. We carried out speaker tasks clean data noise-added data. effectiveness this method was confirmed comparing...

10.1109/slt.2012.6424242 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2012-12-01

We propose Gaussian Mixture Model (GMM)-based emotional voice conversion using spectrum and prosody features. In recent years, speech recognition synthesis techniques have been developed, an technique is required for synthesizing more expressive voices. The common was based on transformation of neutral to by huge corpus. this paper, we convert a GMMs. GMM-based widely used modify non linguistic information such as characteristics while keeping unchanged. Because the conventional method...

10.5923/j.ajsp.20120205.06 article EN American Journal of Signal Processing 2012-12-01

This paper presents a voice conversion (VC) method that utilizes the recently proposed probabilistic models called recurrent temporal restricted Boltzmann machines (RTRBMs). One RTRBM is used for each speaker, with goal of capturing high-order dependencies in an acoustic sequence. Our algorithm starts from separate training one source speaker and another target using speaker-dependent data. Because attempts to discover abstractions maximally express data at time step, as well data, we expect...

10.1109/taslp.2014.2379589 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2014-12-09

Objective The aim of this study was to profile and compare the middle ear microbiomes human subjects with without chronic otitis media. Study Design Prospective multicenter cohort study. Methods All consecutive patients undergoing tympanoplasty surgery for media or conditions other than were recruited. Sterile swab samples collected from mucosa during surgery. variable region 4 16S rRNA gene in each sample amplified using region‐specific primers adapted Illumina MiSeq sequencer (Illumina,...

10.1002/lary.26579 article EN The Laryngoscope 2017-04-11

In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. VC is technique where only speaker-specific information in source speech converted keeping phonological unchanged. Most of existing methods rely on data-pairs from and target speakers uttering same sentences. However, causes several problems: 1) used for are limited to predefined sentences, 2) trained model applied speaker pair training, 3) mismatches alignment may occur....

10.1109/taslp.2016.2593263 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2016-07-19

This paper presents a voice conversion (VC) method that utilizes recently proposed recurrent temporal restricted Boltzmann machines (RTRBMs) for each speaker, with the goal of capturing high-order dependencies in an acoustic sequence. Our algorithm starts from separate training two RTRBMs source and target speaker using speaker-dependent data. Since RTRBM attempts to discover abstractions at time step, as well data, we expect models represent speaker-specific latent features spaces. In our...

10.21437/interspeech.2014-447 article EN Interspeech 2022 2014-09-14

This study investigates the time–frequency dynamics of return and volatility spillovers between stock market three commodity markets: natural gas, crude oil, gold via a comparative analysis United States China is conducted with help new empirical methods. Our findings are as follows. First, in terms time, oil strongest two markets. Crude emits net negative spillover to US market, positive Chinese market. By contrast, effect transmitted markets both countries through gold. However, has on In...

10.1177/0958305x20907081 article EN Energy & Environment 2020-03-02

This paper introduces a novel multimodal framework for economic time series forecasting, integrating textual information with historical price data to enhance predictive accuracy. The proposed method employs multi-head attention mechanism dynamically align embeddings temporal data, capturing previously unrecognized cross-modal dependencies and enhancing the model’s ability interpret event-driven market dynamics. enables model complex behaviors in unified effective manner. Experimental...

10.3390/app15031241 article EN cc-by Applied Sciences 2025-01-25

Previous studies have provided the biological basis for therapeutic use of enamel matrix derivative (EMD) at sites periodontal regeneration. A purpose this study is to determine effects EMD on cell growth, osteoblastic differentiation and insulin-like growth factor-I (IGF-I) transforming factor-beta 1 (TGF-beta 1) production in human ligament cells (HPLC). We also examined participation endogenous IGF-I TGF-beta with EMD-stimulated these cells. HPLCs used were treated alone or combination...

10.1034/j.1600-0765.2003.01607.x article EN Journal of Periodontal Research 2003-01-28

An artificial neural network is one of the most important models for training features in a voice conversion task. Typically, Neural Networks (NNs) are not effective processing low-dimensional F0 features, thus this causes that performance those methods based on networks Mel Cepstral Coefficients (MCC) outstanding. However, can robustly represent various prosody signals (e.g., emotional prosody). In study, we propose an method NNs to train normalized-segment-F0 (NSF0) conversion. Meanwhile,...

10.1109/icis.2016.7550889 article EN 2016-06-01

In this paper, a lip-reading method using novel dynamic feature of lip images is proposed. The calculated as the first-order regression coefficients few neighboring frames (images). It constiutes better representation time derivatives to basic static image. processed by convolution neural networks (CNNs), which are able reduce negative influence caused shaking subject and face alignment blurring at feature-extraction level. Its effectiveness has been confirmed word-recognition experiments...

10.1109/icis.2016.7550888 article EN 2016-06-01

We present in this paper an end-to-end automatic speech recognition (ASR) system for a person with articulation disorder resulting from athetoid cerebral palsy. In the case of type disorder, style is quite different that physically unimpaired person, and amount their data available to train model limited because burden large due strain on muscles. Therefore, performance ASR systems people degrades significantly. paper, we propose framework trained by not only Japanese but also non-Japanese...

10.1109/icassp.2019.8683803 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

This paper introduces a model adaptation approach for speaker-dependent dysarthric speech recognition system. The dysarthria we focus on in this is caused by athetoid cerebral palsy, which causes involuntary muscle movements those with the disease. For reason, people's often unstable and difficult conventional automatic (ASR) systems to recognize. A model-adaptation approach, adapts an ASR speech, one possible solution. However, because difference speaking styles between non-dysarthric...

10.1109/icassp40776.2020.9053725 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Periodontal ligament cells may play an important role in the successful regeneration of periodontium. We investigated effects recombinant human bone morphogenetic protein-2 (rhBMP-2), one most potent growth factors that stimulates osteoblast differentiation and formation, on cell osteoblastic periodontal (HPLC) isolated from four adult patients. rhBMP-2 induced no significant changes any HPLCs. at concentrations over 50 ng/mL significantly stimulated alkaline phosphatase (ALPase) activity...

10.1177/00220345990780100701 article EN Journal of Dental Research 1999-10-01

This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize target signal. The (dictionary) consist of exemplars, having same texts uttered by speakers. input is decomposed into noise their weights (activities). Then, using converted constructed from exemplars. We carried out speaker tasks clean data noise-added data. effectiveness this method was confirmed comparing its with that...

10.1587/transfun.e96.a.1946 article EN IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences 2013-01-01

This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier convert the source speech target than in traditional cepstrum space. We deep architecture that concatenates two RBMs with neural networks, expecting they automatically discover abstractions express original input features. Under this concept, if we train only an individual speaker includes various phonemes...

10.1587/transinf.e97.d.1403 article EN IEICE Transactions on Information and Systems 2014-01-01

We present in this paper an exemplar-based voice conversion (VC) method using a phoneme-categorized dictionary. Sparse representation-based VC Non-negative matrix factorization (NMF) is employed for spectral between different speakers. In our previous NMF-based method, source exemplars and target are extracted from parallel training data, having the same texts uttered by The input signal represented their weights. Then, converted speech constructed weights related to exemplars. However,...

10.1109/icassp.2014.6855137 article EN 2014-05-01

Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based analysis human hearing judgments made by 10 speech therapists for classifying children disorders ( n = 30) and typical development 51). Using stimuli limited to single-word utterances, was superior therapist judgments. There significantly higher true-positive than false-negative rate but not therapists. Results are discussed terms some artificiality...

10.1177/0031512517716855 article EN Perceptual and Motor Skills 2017-06-26

An artificial neural network is an important model for training features of voice conversion (VC) tasks. Typically, networks (NNs) are very effective in processing nonlinear features, such as Mel Cepstral Coefficients (MCC), which represent the spectrum features. However, a simple representation fundamental frequency (F0) not enough NNs to deal with emotional VC. This because time sequence F0 changes drastically. Therefore, our previous method, we used continuous wavelet transform (CWT)...

10.1186/s13636-017-0116-2 article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2017-08-01
Coming Soon ...