- Speech and Audio Processing
- Advanced Adaptive Filtering Techniques
- Music and Audio Processing
- Speech Recognition and Synthesis
- Acoustic Wave Phenomena Research
- Hearing Loss and Rehabilitation
- Indoor and Outdoor Localization Technologies
- Blind Source Separation Techniques
- Ultrasonics and Acoustic Wave Propagation
- Music Technology and Sound Studies
- Infant Health and Development
- Structural Health Monitoring Techniques
- Sensor Technology and Measurement Systems
- Spectroscopy and Chemometric Analyses
- Direction-of-Arrival Estimation Techniques
- Fault Detection and Control Systems
- Neuroscience and Music Perception
- Radar Systems and Signal Processing
- Robotic Locomotion and Control
- Botany, Ecology, and Taxonomy Studies
- Muscle activation and electromyography studies
- Seismology and Earthquake Studies
- Voice and Speech Disorders
- Time Series Analysis and Forecasting
- Hemoglobinopathies and Related Disorders
Microsoft (United States)
2019-2025
University of Applied Sciences Mainz
2024
Microsoft Research (United Kingdom)
2020-2024
Microsoft (Finland)
2020-2024
International Audio Laboratories Erlangen
2013-2018
Fraunhofer Institute for Integrated Circuits
2015-2018
Friedrich-Alexander-Universität Erlangen-Nürnberg
2014-2018
Technion – Israel Institute of Technology
1968-2014
The INTERSPEECH 2020 Deep Noise Suppression (DNS) Challenge is intended to promote collaborative research in realtime single-channel Speech Enhancement aimed maximize the subjective (perceptual) quality of enhanced speech.A typical approach evaluate noise suppression methods use objective metrics on test set obtained by splitting original dataset.While performance good synthetic set, often model degrades significantly real recordings.Also, most conventional do not correlate well with tests...
The Deep Noise Suppression (DNS) challenge was designed to unify the research efforts in area of noise suppression targeted for human perception.We recently organized a DNS special session at INTERSPEECH 2020 and ICASSP 2021.We open-sourced training test datasets wideband scenario along with subjective evaluation framework based on ITU-T standard P.808, which used evaluate participants challenge.Many researchers from academia industry made significant contributions push field forward, yet...
The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression achieve superior perceptual speech quality. This 4th DNS challenge, with previous editions held at INTERSPEECH 2020 [1], ICASSP 2021 [2], and [3]. We open-source datasets test sets for researchers train their deep models, as well a subjective evaluation framework based on ITU-T P.835 rate rank-order entries. provide access DNS-MOS word accuracy (WAcc) APIs participants help iterative...
The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression achieve superior perceptual speech quality. We recently organized a DNS special session at INTERSPEECH 2020 where we open-sourced training and test datasets for researchers train their models. also subjective evaluation framework used tool evaluate select final winners. Many from academia industry made significant contributions push field forward. learned that as research community,...
This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality enhanced speech for real-time single-channel enhancement. Specifically, we focus on enhances short-time spectra single-frame-in, single-frame-out basis, framework adopted by most classical signal processing methods. We propose two novel mean-squared-error-based learning objectives enable separate control over importance distortion versus noise reduction. The...
With recent research advancements, deep learning models are be-coming attractive and powerful choices for speech enhancement in real-time applications. While state-of-the-art can achieve outstanding results terms of quality background noise reduction, the main challenge is to obtain compact enough models, which resource efficient during inference time. An important but often neglected aspect data-driven methods that be only convincing when tested on real-world data evaluated with useful...
The INTERSPEECH 2020 Deep Noise Suppression Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement aimed maximize the subjective (perceptual) quality of enhanced speech. A typical approach evaluate noise suppression methods use objective metrics on test set obtained by splitting original dataset. Many publications report reasonable performance synthetic drawn from same distribution as that training set. However, often model degrades...
The ICASSP 2022 Acoustic Echo Cancellation Challenge is intended to stimulate research in acoustic echo cancellation (AEC), which an important area of speech enhancement and still a top issue audio communication. This the third AEC challenge it enhanced by including mobile scenarios, adding recognition word accuracy rate as metric, making 48 kHz. We open source two large datasets train models under both single talk double scenarios. These consist recordings from more than 10,000 real devices...
The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of DNS challenge series. challenges were organized from 2019 to foster research in field DNS. Previous held at INTERSPEECH 2020, 2021, and 2022. This aims advance models capable jointly addressing denoising, dereverberation, interfering talker suppression, with separate tracks focusing on headset speakerphone scenarios. facilitates personalized deep noise suppression by providing accompanying enrollment clips for...
The ICASSP 2023 Speech Signal Improvement Challenge is intended to stimulate research in the area of improving speech signal quality communication systems. can be measured with SIG ITU-T P.835 and still a top issue audio conferencing For example, 2022 Deep Noise Suppression challenge, improvement background overall impressive, but not statistically significant. To improve following impairment areas must addressed: coloration, discontinuity, loudness, reverberation, noise. A training test set...
The ICASSP 2023 Acoustic Echo Cancellation Challenge is intended to stimulate research in acoustic echo cancellation (AEC), which an important area of speech enhancement and still a top issue audio communication. This the fourth AEC challenge it enhanced by adding second track for personalized cancellation, reducing algorithmic + buffering latency 20 ms, as well including full-band version AECMOS [1]. We open source two large datasets train models under both single talk double scenarios....
Reduction of late reverberation can be achieved using spatio-spectral filters, such as the multichannel Wiener filter. To compute this filter, an estimate power spectral density (PSD) is required. In recent years, a multitude PSD estimators have been proposed. paper, these are categorized into several classes, their relations and differences discussed, comprehensive experimental comparison provided. compare performance, simulations in controlled well practical scenarios conducted. It shown...
The ICASSP 2021 Acoustic Echo Cancellation Challenge is intended to stimulate research in the area of acoustic echo cancellation (AEC), which an important part speech enhancement and still a top issue audio communication conferencing systems. Many recent AEC studies report good performance on synthetic datasets where train test samples come from same underlying distribution. However, often degrades significantly real recordings. Also, most conventional objective metrics such as return loss...
Deep learning-based speech enhancement for real-time applications recently made large advancements. Due to the lack of a tractable perceptual optimization target, many myths around training losses emerged, whereas contribution success loss functions in cases has not been investigated isolated from other factors such as network architecture, features, or procedures. In this work, we investigate wide variety spectral recurrent neural architecture suitable operate online frame-by-frame...
Multichannel linear prediction-based dereverberation in the short-time Fourier transform (STFT) domain has been shown to be highly effective. Using this framework, desired dereverberated multichannel signal is obtained by filtering noise-free reverberant signals using estimated autoregressive (MAR) coefficients. To use such methods presence of noise, especially case online processing, remains a challenging problem. Existing sequential enhancement structures, which first remove noise and then...
Reverberant signals can be modeled in the short-time Fourier transform domain using a frequency-dependent autoregressive (AR) model. In state-of-the-art, these AR coefficients have been considered stationary, which does not hold time-varying environments. We propose to model first-order Markov process, whereas reverberant microphone signal observations are deterministic. This leads framework where optimally estimated Kalman filter per subband. As consequence, we dereverberate observed by...
Deep learning based speech enhancement has made rapid development towards improving quality, while models are becoming more compact and usable for real-time on-the-edge inference. However, the quality scales directly with model size, small often still unable to achieve sufficient quality. Furthermore, introduced distortion artifacts greatly harm intelligibility, significantly degrade automatic recognition (ASR) rates. In this work, we shed light on success of spectral complex compressed mean...
The growing popularity of generative music models underlines the need for perceptually relevant, objective quality metrics. Frechet Audio Distance (FAD) is commonly used this purpose even though its correlation with perceptual understudied. We show that FAD performance may be hampered by sample size bias, poor choice audio embeddings, or use biased low-quality reference sets. propose reducing bias extrapolating scores towards an infinite size. Through comparisons MusicCaps labels and a...