NFDI4DS | UHH-SEMS - Publication Details

Hiroshi Saruwatari

ORCID: 0000-0003-0876-5617

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5003814223

Research Areas

Speech and Audio Processing
Blind Source Separation Techniques
Speech Recognition and Synthesis
Advanced Adaptive Filtering Techniques
Music and Audio Processing
Speech and dialogue systems
Natural Language Processing Techniques
Acoustic Wave Phenomena Research
Hearing Loss and Rehabilitation
Image and Signal Denoising Methods
Topic Modeling
Advanced Algorithms and Applications
Neural Networks and Applications
Underwater Acoustics Research
Advanced Data Compression Techniques
Phonetics and Phonology Research
Music Technology and Sound Studies
Aerodynamics and Acoustics in Jet Flows
Robotics and Automated Systems
Structural Health Monitoring Techniques
Ultrasonics and Acoustic Wave Propagation
Sparse and Compressive Sensing Techniques
Spectroscopy and Chemometric Analyses
Direction-of-Arrival Estimation Techniques
Vehicle Noise and Vibration Control

The University of Tokyo
2016-2025

The Graduate University for Advanced Studies, SOKENDAI
2017

Nara Institute of Science and Technology
2005-2014

Kagoshima University
2009

Nagoya University
1999-2002

Secom (Japan)
1999

Kyushu University
1995

Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization

OPENALEX - Publications

Daichi Kitamura Nobutaka Ono Hiroshi Sawada Hirokazu Kameoka Hiroshi Saruwatari

This paper addresses the determined blind source separation problem and proposes a new effective method unifying independent vector analysis (IVA) nonnegative matrix factorization (NMF). IVA is state-of-the-art technique that utilizes statistical independence between sources in mixture signal, an efficient optimization scheme has been proposed for IVA. However, since model based on spherical multivariate distribution, cannot utilize specific spectral structures such as harmonic of pitched...

10.1109/taslp.2016.2577880 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2016-06-07

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

OPENALEX - Publications

Yuki Saito Shinnosuke Takamichi Hiroshi Saruwatari

A method for statistical parametric speech synthesis incorporating generative adversarial networks (GANs) is proposed. Although powerful deep neural techniques can be applied to artificially synthesize waveform, the synthetic quality low compared with that of natural speech. One issues causing degradation an oversmoothing effect often observed in generated parameters. GAN introduced this paper consists two networks: a discriminator distinguish and samples, generator deceive discriminator. In...

10.1109/taslp.2017.2761547 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2017-10-09

The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech

OPENALEX - Publications

Shoko Araki Ryo Mukai Shoji Makino Tsuyoki Nishikawa Hiroshi Saruwatari

Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the performance is still not good enough. In particular, when impulse responses are long, highly limited. this paper, we consider a two-input, two-output convolutive BSS problem. First, show that it be constrained by condition T>P, where T frame length of DFT and P room responses. We there an optimum size determined trade-off between maintaining number samples in each frequency bin...

10.1109/tsa.2003.809193 article EN IEEE Transactions on Speech and Audio Processing 2003-03-01

Blind source separation based on a fast-convergence algorithm combining ICA and beamforming

OPENALEX - Publications

Hiroshi Saruwatari Tatsuyuki Kawamura Tsuyoki Nishikawa A. Lee Kiyohiro Shikano

We propose a new algorithm for blind source separation (BSS), in which independent component analysis (ICA) and beamforming are combined to resolve the slow-convergence problem through optimization ICA. The proposed method consists of following three parts: (a) frequency-domain ICA with direction-of-arrival (DOA) estimation, (b) null based on estimated DOA, (c) integration diversity both iteration frequency domain. unmixing matrix obtained by is temporally substituted iterative optimization,...

10.1109/tsa.2005.855832 article EN IEEE Transactions on Audio Speech and Language Processing 2006-02-21

Blind Source Separation Combining Independent Component Analysis and Beamforming

OPENALEX - Publications

Hiroshi Saruwatari Satoshi Kurita Kazuya Takeda Fumitada Itakura Tsuyoki Nishikawa and 1 more

We describe a new method of blind source separation (BSS) on microphone array combining subband independent component analysis (ICA) and beamforming. The proposed system consists the following three sections: (1) ICA-based BSS section with estimation direction arrival (DOA) sound source, (2) null beamforming based estimated DOA, (3) integration algorithm diversity. Using this technique, we can resolve low-convergence problem through optimization in ICA. To evaluate its effectiveness,...

10.1155/s1110865703305104 article EN cc-by EURASIP Journal on Advances in Signal Processing 2003-10-05

Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech

OPENALEX - Publications

Keigo Nakamura Tomoki Toda Hiroshi Saruwatari Kiyohiro Shikano

10.1016/j.specom.2011.07.007 article EN Speech Communication 2011-07-28

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

OPENALEX - Publications

Takaaki Saeki Detai Xin Wataru Nakata Tomoki Koriyama Shinnosuke Takamichi and 1 more

We present the UTokyo-SaruLab mean opinion score (MOS) prediction system submitted to VoiceMOS Challenge 2022.The challenge is predict MOS values of speech samples collected from previous Blizzard Challenges and Voice Conversion for two tracks: a main track in-domain an out-of-domain (OOD) which there less labeled data different listening tests.Our based on ensemble learning strong weak learners.Strong learners incorporate several improvements finetuning models self-supervised (SSL) models,...

10.21437/interspeech.2022-439 article EN Interspeech 2022 2022-09-16

Evaluation of blind signal separation method using directivity pattern under reverberant conditions

OPENALEX - Publications

Satoshi Kurita Hiroshi Saruwatari S. Kajita Kazuya Takeda Fumitada Itakura

This paper describes a new blind signal separation method using the directivity patterns of microphone array. In this method, to deal with arriving lags among each microphone, inverses mixing matrices are calculated in frequency domain so that separated signals mutually independent. Since calculations carried out independently, following problems arise: (1) permutation sound source, (2) arbitrariness source gain. paper, we propose solution explicitly used estimate direction. As results...

10.1109/icassp.2000.861203 article EN 2002-11-07

Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum

OPENALEX - Publications

Tomoki Toda Hiroshi Saruwatari Kiyohiro Shikano

In the voice conversion algorithm based on Gaussian Mixture Model (GMM) applied to STRAIGHT, quality of converted speech is degraded because spectrum exceedingly smooth. We propose GMM-based with dynamic frequency warping avoid over-smoothing. also an addition weighted residual spectrum, which difference between and frequency-warped deterioration conversion-accuracy speaker individuality. Results evaluation experiments clarify that better than algorithm, individuality same as in proposed...

10.1109/icassp.2001.941046 article EN 2002-11-13

Blind Spatial Subtraction Array for Speech Enhancement in Noisy Environment

OPENALEX - Publications

Yu Takahashi Tomoya Takatani Keiichi Osako Hiroshi Saruwatari Kiyohiro Shikano

We propose a new blind spatial subtraction array (BSSA) consisting of noise estimator based on independent component analysis (ICA) for efficient speech enhancement. In this paper, first, we theoretically and experimentally point out that ICA is proficient in estimation under non-point-source condition rather than estimation. Therefore, BSSA utilizes as estimator. BSSA, extraction achieved by subtracting the power spectrum signals estimated using from partly enhanced target signal with...

10.1109/tasl.2008.2011517 article EN IEEE Transactions on Audio Speech and Language Processing 2009-03-24

JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis

OPENALEX - Publications

Ryosuke Sonobe Shinnosuke Takamichi Hiroshi Saruwatari

Thanks to improvements in machine learning techniques including deep learning, a free large-scale speech corpus that can be shared between academic institutions and commercial companies has an important role. However, such for Japanese synthesis does not exist. In this paper, we designed novel corpus, named the "JSUT corpus," is aimed at achieving end-to-end synthesis. The consists of 10 hours reading-style data its transcription covers all main pronunciations daily-use characters. describe...

10.48550/arxiv.1711.00354 preprint EN cc-by-sa arXiv (Cornell University) 2017-01-01

Sound Field Recording Using Distributed Microphones Based on Harmonic Analysis of Infinite Order

OPENALEX - Publications

Natsuki Ueno Shoichi Koyama Hiroshi Saruwatari

A sound field recording method based on spherical or circular harmonic analysis for arbitrary array geometry and directivity of microphones is proposed. In current methods analysis, a decomposed into functions with center given in advance, which called global origin, their coefficients are obtained up to certain truncation order using microphone measurements. However, the accuracy reconstructed depends predefined position origin order, makes it difficult apply this technique an asymmetric...

10.1109/lsp.2017.2775242 article EN IEEE Signal Processing Letters 2017-11-21

Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation

OPENALEX - Publications

Naoki Makishima Shinichi Mogami Norihiro Takamune Daichi Kitamura Hayato Sumino and 3 more

In this paper, we propose a new framework called independent deeply learned matrix analysis (IDLMA), which unifies deep neural network (DNN) and independence-based multichannel audio source separation. IDLMA utilizes both pretrained DNN models statistical independence between sources for the separation, where time-frequency structures of each are iteratively optimized by while enhancing estimation accuracy spatial demixing filters. As generative model, introduce complex heavy-tailed...

10.1109/taslp.2019.2925450 article EN cc-by IEEE/ACM Transactions on Audio Speech and Language Processing 2019-06-27

Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation

OPENALEX - Publications

Yamato Ohtani Tomoki Toda Hiroshi Saruwatari Kiyohiro Shikano

INTERSPEECH2006: the 9th International Conference on Spoken Language Processing (ICSLP), September 17-21, 2006, Pittsburgh, Pennsylvania, USA.

10.21437/interspeech.2006-582 article EN Interspeech 2022 2006-09-17

Musical-Noise-Free Speech Enhancement Based on Optimized Iterative Spectral Subtraction

OPENALEX - Publications

Ryoken Miyazaki Hiroshi Saruwatari Takayuki Inoue Yu Takahashi Kiyohiro Shikano and 1 more

In this paper, we provide a theoretical analysis of the amount musical noise in iterative spectral subtraction, and its optimization method for least generation. To achieve high-quality reduction with low noise, i.e., iteratively applied weak nonlinear signal processing, has been proposed. Although effectiveness reported experimentally, there have no studies. Therefore, formulate generation process by tracing change kurtosis spectra, conduct comparison different parameter settings but same...

10.1109/tasl.2012.2196513 article EN IEEE Transactions on Audio Speech and Language Processing 2012-04-27

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

OPENALEX - Publications

Hiroyuki Miyoshi Yuki Saito Shinnosuke Takamichi Hiroshi Saruwatari

Voice conversion (VC) using sequence-to-sequence learning of context posterior probabilities is proposed.Conventional VC shared predicts target speech parameters from the estimated source parameters.Although conventional can be built non-parallel data, it difficult to convert speaker individuality such as phonetic property and speaking rate contained in because are directly used for predicting parameters.In this work, we assume that training data partly include parallel propose between...

10.21437/interspeech.2017-247 preprint EN Interspeech 2022 2017-08-16

Generalized independent low-rank matrix analysis using heavy-tailed distributions for blind source separation

OPENALEX - Publications

Daichi Kitamura Shinichi Mogami Yoshiki Mitsui Norihiro Takamune Hiroshi Saruwatari and 3 more

In this paper, statistical-model generalizations of independent low-rank matrix analysis (ILRMA) are proposed for achieving high-quality blind source separation (BSS). BSS is a crucial problem in realizing many audio applications, where the sources must be separated using only observed mixture signal. Many algorithms solving have been proposed, especially history component and nonnegative factorization. particular, ILRMA can achieve highest performance music or speech mixtures, assumes both...

10.1186/s13634-018-0549-5 article EN cc-by EURASIP Journal on Advances in Signal Processing 2018-05-02

GMM-based voice conversion applied to emotional speech synthesis

OPENALEX - Publications

Hiromichi Kawanami Yohei Iwami Tomoki Toda Hiroshi Saruwatari Kiyohiro Shikano

EUROSPEECH2003: 8th European Conference on Speech Communication and Technology, September 1-4, 2003, Geneva, Switzerland.

10.21437/eurospeech.2003-661 article EN 2003-09-01

Equivalence between Frequency-Domain Blind Source Separation and Frequency-Domain Adaptive Beamforming for Convolutive Mixtures

OPENALEX - Publications

Shoko Araki Shoji Makino Yoichi Hinamoto Ryo Mukai Tsuyoki Nishikawa and 1 more

Frequency-domain blind source separation (BSS) is shown to be equivalent two sets of frequency-domain adaptive beamformers (ABFs) under certain conditions. The zero search the off-diagonal components in BSS update equation can viewed as minimization mean square error ABFs. unmixing matrix and filter coefficients ABFs converge same solution if signals are ideally independent. If they dependent, this results a bias for correct coefficients. Therefore, performance limited that ABF use exact...

10.1155/s1110865703305074 article EN cc-by EURASIP Journal on Advances in Signal Processing 2003-10-05

Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion

OPENALEX - Publications

Hironori Doi Tomoki Toda Keigo Nakamura Hiroshi Saruwatari Kiyohiro Shikano

In this paper, we present novel speaking-aid systems based on one-to-many eigenvoice conversion (EVC) to enhance three types of alaryngeal speech: esophageal speech, electrolaryngeal and body-conducted silent speech. Although speech allows laryngectomees utter sounds, it suffers from the lack quality speaker individuality. To improve alaryngeal-speech-to-speech (AL-to-Speech) methods statistical voice have been proposed. EVC capable flexibly controlling converted by adapting model given...

10.1109/taslp.2013.2286917 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2013-10-23

Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model

OPENALEX - Publications

Daichi Kitamura Nobutaka Ono Hiroshi Sawada Hirokazu Kameoka Hiroshi Saruwatari

This paper proposes a new efficient multichannel nonnegative matrix factorization (NMF) method. Recently, NMF (MNMF) has been proposed as means of solving the blind source separation problem. method estimates mixing system sources and attempts to separate them in fashion. However, this is strongly dependent on its initial values because there are no constraints spatial models. To solve problem, we introduce rank-1 model into MNMF. The demixing while representing using bases can be optimized...

10.1109/icassp.2015.7177975 article EN 2015-04-01

Phase Reconstruction from Amplitude Spectrograms Based on Von-Mises-Distribution Deep Neural Network

OPENALEX - Publications

Shinnosuke Takamichi Yuki Saito Norihiro Takamune Daichi Kitamura Hiroshi Saruwatari

This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms. In audio signal and speech processing, the spectrogram is often used for corresponding reconstructed on basis of Griffin-Lim method. However, method causes unnatural artifacts in synthetic speech. Addressing this problem, we introduce von-Mises-distribution DNN reconstruction. The generative model having von Mises distribution that can distributions periodic variable such as phase,...

10.1109/iwaenc.2018.8521313 preprint EN 2018-09-01

Three-Dimensional Sound Field Reproduction Based on Weighted Mode-Matching Method

OPENALEX - Publications

Natsuki Ueno Shoichi Koyama Hiroshi Saruwatari

A sound field reproduction method based on the spherical wavefunction expansion of fields is proposed, which can be flexibly applied to various array geometries and directivities. First, we formulate synthesis as a minimization problem some norm difference between desired synthesized fields, then optimal driving signals are derived by using fields. This formulation closely related mode-matching method; major advantage proposed weight mode determined according minimized instead empirical...

10.1109/taslp.2019.2934834 article EN cc-by IEEE/ACM Transactions on Audio Speech and Language Processing 2019-08-14

Coming Soon ...