- Speech and Audio Processing
- Advanced Adaptive Filtering Techniques
- Blind Source Separation Techniques
- Music and Audio Processing
- Speech Recognition and Synthesis
- Advancements in Photolithography Techniques
- Acoustic Wave Phenomena Research
- Indoor and Outdoor Localization Technologies
- Direction-of-Arrival Estimation Techniques
- Sparse and Compressive Sensing Techniques
- Underwater Acoustics Research
- Hearing Loss and Rehabilitation
- Advanced Wireless Communication Techniques
- Numerical Methods and Algorithms
- Computer Graphics and Visualization Techniques
- Digital Image Processing Techniques
- Computational Geometry and Mesh Generation
- Advanced Data Compression Techniques
- Infant Health and Development
- Image and Signal Denoising Methods
- VLSI and FPGA Design Techniques
- Animal Vocal Communication and Behavior
- Model Reduction and Neural Networks
- Advanced Numerical Analysis Techniques
- Advancements in PLL and VCO Technologies
Line Corporation (Japan)
2020-2023
Tokyo Metropolitan University
2018-2020
École Polytechnique Fédérale de Lausanne
2009-2018
The University of Tokyo
2018
École Normale Supérieure - PSL
2013
IBM Research - Zurich
2012
We present pyroomacoustics, a software package aimed at the rapid development and testing of audio array processing algorithms. The content can be divided into three main components: an intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources microphones in 2D 3D rooms; fast C implementation image source model for general polyhedral rooms efficiently generate room impulse responses simulate propagation between receivers;...
We extend frequency-domain blind source separation based on independent vector analysis to the case where there are more microphones than sources. The signal is modelled as non-Gaussian sources in a Gaussian background. proposed algorithm parametrization of demixing matrix decreasing number parameters estimate. Furthermore, orthogonal constraints between and background subspaces imposed regularize separation. problem can then be posed constrained likelihood maximization. propose efficient...
We propose a new algorithm for the blind source separation of acoustic sources. This is an alternative to popular auxiliary function based independent vector analysis using iterative projection (AuxIVA-IP). It optimizes same cost function, but instead alternate updates rows demixing matrix, we sequence rank-1 updates. Remarkably, and unlike previous method, resulting do not require matrix inversion. Moreover, their computational complexity quadratic in number microphones, rather than cubic...
We present the concept of an acoustic rake receiver---a microphone beamformer that uses echoes to improve noise and interference suppression. The idea is well-known in wireless communications; it involves constructively combining different multipath components arrive at receiver antennas. Unlike spread-spectrum signals used communications, speech are not orthogonal their shifts. Therefore, we focus on spatial structure, rather than temporal. Instead explicitly estimating channel, create...
This paper presents recent progress on integrating speech separation and enhancement (SSE) into the ESPnet toolkit.Compared with previous ESPnet-SE work, numerous features have been added, including state-of-the-art models their respective training evaluation recipes.Importantly, a new interface has designed to flexibly combine front-ends other tasks, automatic recognition (ASR), translation (ST), spoken language understanding (SLU).To showcase such integration, we performed experiments...
We propose DiffSep, a new single channel source separation method based on score-matching of stochastic differential equation (SDE). craft tailored continuous time diffusion-mixing process starting from the separated sources and converging to Gaussian distribution centered their mixture. This formulation lets us apply machinery score-based generative modelling. First, we train neural network approximate score function marginal probabilities process. Then, use it solve reverse SDE that...
A new iterative low complexity algorithm has been presented for computing the Walsh-Hadamard transform (WHT) of an $N$ dimensional signal with a $K$-sparse WHT, where is power two and $K = O(N^α)$, scales sub-linearly in some $0 < α< 1$. Assuming random support model non-zero domain components, reconstructs WHT sample $O(K \log_2(\frac{N}{K}))$, computational $O(K\log_2(K)\log_2(\frac{N}{K}))$ very high probability asymptotically tending to 1. The approach based on subsampling...
End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker problem simultaneously single network. While EEND model can produce all frame-level labels simultaneously, it disregards output label dependency. In this work, we propose novel that introduces dependency between frames. The proposed generates non-autoregressive intermediate at lower layers and conditions subsequent these labels. works in manner, are refined by...
We propose fast independent vector extraction (FIVE), a new algorithm that blindly extracts single non-Gaussian source from Gaussian background. The iteratively computes beam-forming weights maximizing the signal-to-interference-and-noise ratio for an approximate noise covariance matrix. demonstrate this procedure minimizes negative log-likelihood of input data according to well-defined probabilistic model. minimization is carried out via auxiliary function technique whereas, unlike related...
We propose a new algorithm for joint dereverberation and blind source separation (DR-BSS). Our work builds upon the IRLMA-T framework that applies unified filter combining separation. One drawback of this is it requires several matrix inversions, an operation inherently costly with potential stability issues. leverage recently introduced iterative steering (ISS) updates to two algorithms mitigating issue. Albeit derived from first principles, turns out be natural combination weighted...
We propose to learn surrogate functions of universal speech priors for determined blind separation. Deep are highly desirable due their superior modelling power, but not compatible with state-of-the-art independent vector analysis based on majorization-minimization (AuxIVA), since deriving the required function is easy, nor always possible. Instead, we do away exact majorization and directly approximate surrogate. Taking advantage iterative source steering (ISS) updates, back propagate...
We propose a new algorithm for blind source separation (BSS) using independent vector analysis (IVA). This is an improvement over the popular auxiliary function based IVA (AuxIVA) with iterative projection (IP) or steering (ISS). introduce <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">iterative adjustment</i> (IPA), where we update one demixing filter and xmlns:xlink="http://www.w3.org/1999/xlink">jointly</i> adjust all other sources along...
Knowing what amount of radioactive material was released from Fukushima in March 2011 is crucial to understand the scope consequences. Moreover, it could be used forward simulations obtain accurate maps deposition. But these data are often not publicly available, or questionable quality. We propose estimate emission waveforms by solving an inverse problem. Previous approaches rely on a detailed expert guess how releases appeared, and they produce solution strongly biased this guess. If we...
In this paper we present FRIDA---an algorithm for estimating directions of arrival multiple wideband sound sources. FRIDA combines multi-band information coherently and achieves state-of-the-art resolution at extremely low signal-to-noise ratios. It works arbitrary array layouts, but unlike the various steered response power subspace methods, it does not require a grid search. leverages recent advances in sampling signals with finite rate innovation. is based on insight that any layout,...
It is commonly believed that multipath hurts various audio processing algorithms. At odds with this belief, we show in fact helps sound source separation, even very simple propagation models. Unlike most existing methods, neither ignore the room impulse responses, nor attempt to estimate them fully. We rather assume know positions of a few virtual microphones generated by echoes and how gives us enough spatial diversity get performance boost over anechoic case. improvements for two standard...
We propose an efficient iterative method to estimate a sub-sample time delay between two signals. formulate it as the optimization problem of maximizing generalized cross correlation (GCC) signals in terms continuous parameter. The maximization is carried out with auxiliary function method. First, we prove that when written sum cosines, GCC can be lower bounded at any point by quadratic function. By repeatedly this lower-bound, alternative update algorithm for estimation derived. follow...
We revisit the widely used bss_eval metrics for source separation with an eye out performance. propose a fast algorithm fixing shortcomings of publicly available implementations. First, we show that are fully specified by squared cosine just two angles between estimate and reference subspaces. Second, large linear systems involved. However, they structured, apply iterative method based on conjugate gradient descent. The complexity this step is thus reduced factor quadratic in distortion...
We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean feature extraction diffusion. Our contributions are three-fold. First, make several modifications to network architecture, improving training stability final performance. Second, introduce an loss promote learning high quality features. Third, low-rank adaptation scheme with phoneme fidelity content...
We develop an end-to-end system for multi-channel, multi-speaker automatic speech recognition. propose a frontend joint source separation and dereverberation based on the independent vector analysis (IVA) paradigm. It uses fast stable iterative steering algorithm together with neural model. Unlike conventional beamforming, number of speakers can be dynamically changed during or after training. The parameters from ASR module model are optimized jointly loss itself. demonstrate competitive...
In this work, we propose efficient algorithms for joint independent subspace analysis (JISA), an extension of component that deals with parallel mixtures, where not all the components are independent. We derive algorithmic framework JISA based on majorization-minimization (MM) optimization technique (JISA-MM). use a well-known inequality super-Gaussian sources to surrogate function negative log-likelihood observed data. The minimization leads variant hybrid exact-approximate diagonalization...
The geometry of room acoustics is such that the reverberant signal can be seen as same waveform emitted from multiple locations.In analogy with rake receiver wireless com munications, we propose several beamforming strategies exploit, rather than suppress, this additional spatio-temporal di versity.Unlike earlier work in frequency domain, time do main designs allow to shape impulse response beam former.In particular, control perceptually relevant pa rameters, amount early echoes or length...
In this paper we introduce an open source and reproducible microphone array hardware design anechoic dataset recorded with array. The Pyramic has 48 microphones spread onto six identical modules connected to FPGA-ARM combo. arrangement of the can be reconfigured create a large number geometries. We describe in detail architecture make openly available all necessary files, VHDL code, C libraries together extensive documentation. This effectively enables replicability part or curated...
In this paper, we investigate the effectiveness of spatial features for acoustic scene classification (ASC) with distributed microphones. Assuming that multiple subarrays, each containing micro-phones, are and synchronized, consider two types generalized cross-correlation phase transform (GCC-PHAT) as features: intra- inter-subarray GCC-PHATs. They obtained from channels within same subarray between different respectively. The log-Mel spectrogram a spectral feature or GCC-PHAT processed in...
In our demo, we present two hardware platforms for prototyping audio array signal processing. Pyramic is a 48-channel microphone fitted on an FPGA and Compact Six portable with six microphones, closer to the technical constraints of consumer electronics. A browser based interface was developed that allows user interact stream from arrays in real time. The software component this demo Python module implementations basic processing blocks popular techniques like STFT, beamforming, DoA. Both...