- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Natural Language Processing Techniques
- Speech and dialogue systems
- Blind Source Separation Techniques
- Topic Modeling
- EEG and Brain-Computer Interfaces
- Advanced Adaptive Filtering Techniques
- Phonetics and Phonology Research
- Advanced Data Compression Techniques
- Neural Networks and Applications
- Neural dynamics and brain function
- Voice and Speech Disorders
- Language Development and Disorders
- Fault Detection and Control Systems
- Control Systems and Identification
- Hearing Loss and Rehabilitation
- Multimodal Machine Learning Applications
- Structural Health Monitoring Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Electrical Measurement Techniques
- Music Technology and Sound Studies
- Direction-of-Arrival Estimation Techniques
- Video Analysis and Summarization
KU Leuven
2016-2025
École Supérieure des Arts Saint-Luc de Liège
2018
University of Lomé
2016
iMinds
2016
Radboud University Nijmegen
2009
Vrije Universiteit Brussel
1987-2003
Fund for Scientific Research
2003
Vrije Universiteit Amsterdam
1992
This paper gives a survey of frequency domain identification methods for rational transfer functions in the Laplace (s) or z-domain. The interrelations between different approaches are highlighted through study (equivalent) cost functions. properties various estimators discussed and illustrated by several examples.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
The properties of five interpolating fast Fourier transform (IFFT) methods are studied with respect to their systematic errors and noise sensitivity, for a monofrequency signal. It is shown that windows small spectral side lobes do not always result in better overall performance the IFFT method time-domain estimators can be more efficient than analyzed methods.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
Abstract To investigate the processing of speech in brain, commonly simple linear models are used to establish a relationship between brain signals and features. However, these ill-equipped model highly-dynamic, complex non-linear system like they often require substantial amount subject-specific training data. This work introduces novel decoder architecture: Very Large Augmented Auditory Inference (VLAAI) network. The VLAAI network outperformed state-of-the-art subject-independent (median...
Researchers investigating the neural mechanisms underlying speech perception often employ electroencephalography (EEG) to record brain activity while participants listen spoken language. The high temporal resolution of EEG enables study responses fast and dynamic signals. Previous studies have successfully extracted characteristics from data and, conversely, predicted features. Machine learning techniques are generally employed construct encoding decoding models, which necessitate a...
An effective way to increase the noise robustness of automatic speech recognition is label noisy features as either reliable or unreliable (missing), and replace (impute) missing ones by clean estimates. Conventional imputation techniques employ parametric models impute on a frame-by-frame basis. At low signal-to-noise ratios (SNRs), these fail, because too many time frames may contain few, if any, features. In this paper, we introduce novel non-parametric, exemplar-based method for...
We present a novel, exemplar-based method for audio event detection based on non-negative matrix factorisation. Building recent work in noise robust automatic speech recognition, we model events as linear combination of dictionary atoms, and mixtures overlapping events. The weights activated atoms an observation serve directly evidence the underlying classes. span multiple frames are created by extracting all possible fixed-length exemplars from training data. To combat data scarcity small...
In this paper, three utterance modelling approaches, namely Gaussian Mean Supervector (GMS), i-vector and Posterior Probability (GPPS), are applied to the accent recognition problem.For each modeling method, different classifiers, Support Vector Machine (SVM), Naive Bayesian Classifier (NBC) Sparse Representation (SRC), employed find out suitable matches between schemes classifiers.The evaluation database is formed by using English utterances of speakers whose native languages Russian,...
Unseen noise estimation is a key yet challenging step to make speech enhancement algorithm work in adverse environments. At worst, the only prior knowledge we know about encountered that it different from involved speech. Therefore, by subtracting components which cannot be adequately represented well defined model, noises can estimated and removed. Given good performance of deep learning signal representation, auto encoder (DAE) employed this for accurately modeling clean spectrum. In...
In this paper, a bottom-up, activation-based paradigm for continuous speech recognition is described. Speech described by co-occurrence statistics of acoustic events over an analysis window variable length, leading to vectorial representation high but fixed dimension called “Histogram Acoustic Co-occurrence” (HAC). During training, recurring patterns are discovered and associated words through non-negative matrix factorisation. testing, word activations computed from the HACrepresentation...
Motivated by the success of i-vectors in field speaker recognition, this paper proposes a new approach for age estimation from telephone speech patterns based on i-vectors.In method, each utterance is modeled its corresponding ivector.Then, Support Vector Regression (SVR) applied to estimate speakers.The proposed method trained and tested conversations National Institute Standard Technology (NIST) 2010 2008 Speaker Recognition Evaluations databases.Evaluation results show that outperforms...
Modeling the relationship between natural speech and a recorded electroencephalogram (EEG) helps us understand how brain processes has various applications in neuroscience brain-computer interfaces. In this context, so far mainly linear models have been used. However, decoding performance of model is limited due to complex highly non-linear nature auditory processing human brain. We present novel Long Short-Term Memory (LSTM)-based architecture as nonlinear for classification problem whether...
Learning a set of tasks in sequence remains challenge for artificial neural networks, which, such scenarios, tend to suffer from Catastrophic Forgetting (CF). The same applies End-to-End (E2E) Automatic Speech Recognition (ASR) models, even monolingual tasks. In this paper, we aim overcome CF E2E ASR by inserting adapters, small architectures few parameters which allow general model be fine-tuned specific task, into our model. We make these adapters task-specific, while regularizing the...
Abstract INTRODUCTION The automated analysis of connected speech using natural language processing (NLP) emerges as a possible biomarker for Alzheimer's disease (AD). However, it remains unclear which types are most sensitive and specific the detection AD. METHODS We applied model to automatically transcribed from 114 Flemish‐speaking individuals first distinguish early AD patients amyloid negative cognitively unimpaired (CU) then positive CU five different speech. RESULTS was able between...
The recent advancement of speech recognition technology has been driven by large-scale datasets and attention-based architectures, but many challenges still remain, especially for low-resource languages dialects. This paper explores the integration weakly supervised transcripts from TV subtitles into automatic (ASR) systems, aiming to improve both verbatim transcriptions automatically generated subtitles. To this end, data are regarded as different domains or languages, due their distinct...
Automatic speech recognition (ASR) systems often struggle to recognize from individuals with dysarthria, a disorder neuromuscular causes, accuracy declining further for unseen speakers and content. Achieving robustness such situations requires ASR address speaker-independent vocabulary-mismatched scenarios, minimizing user adaptation effort. This study focuses on comprehensive training strategies methods tackle these challenges, leveraging the transformer-based Wav2Vec2.0 model. Unlike prior...
We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to baseline word-level language model and number of parameters model. Character information can reveal structural (dis)similarities between words even be used when word is out-of-vocabulary, thus improving modeling infrequent unknown words. By concatenating character embeddings, we achieve up 2.77% relative improvement on English compared similar amount 4.57% Dutch. Moreover, also...