- Speech Recognition and Synthesis
- Speech and Audio Processing
- Natural Language Processing Techniques
- Music and Audio Processing
- Phonetics and Phonology Research
- Speech and dialogue systems
- Basque language and culture studies
- Spanish Linguistics and Language Studies
- Voice and Speech Disorders
- Emotion and Mood Recognition
- Linguistic Studies and Language Acquisition
- Advanced Data Compression Techniques
- Subtitles and Audiovisual Media
- Semantic Web and Ontologies
- Blind Source Separation Techniques
- Music Technology and Sound Studies
- Journalism and Media Studies
- Digital Filter Design and Implementation
- Infant Health and Development
- Radio, Podcasts, and Digital Media
- Video Analysis and Summarization
- Social Sciences and Policies
- Tracheal and airway disorders
- Topic Modeling
- Advanced Adaptive Filtering Techniques
University of the Basque Country
2015-2025
Basque Center for Applied Mathematics
2023-2025
Iberdrola (Spain)
2009
Ente Vasco de la Energía
2004
The definition of parameters is a crucial step in the development system for identifying emotions speech. Although there no agreement on which are best features this task, it generally accepted that prosody carries most emotional information. Most works field use some kind prosodic features, often combination with spectral and voice quality parametrizations. Nevertheless, systematic study has been done comparing these features. This paper presents analysis characteristics derived from...
This article explores the potential of harmonics plus noise model speech in development a high-quality vocoder applicable statistical frameworks, particularly modern synthesizers. It presents an extensive explanation all different alternatives considered during design HNM-based vocoder, together with corresponding objective and subjective experiments, careful description its implementation details. Three aspects analysis have been investigated: refinement pitch estimation using...
In the field of speaker verification (SV) it is nowadays feasible and relatively easy to create a synthetic voice deceive speech driven biometric access system. This paper presents detector that can be connected at front-end or back-end standard SV system, will protect from spoofing attacks coming state-of-the-art statistical Text Speech (TTS) systems. The system described Gaussian Mixture Model (GMM) based binary classifier uses natural copy-synthesized signals obtained Wall Street Journal...
Voice conversion methods based on frequency warping followed by amplitude scaling have been recently proposed. These modify the axis of source spectrum in such manner that some significant parts it, usually formants, are moved towards their image target speaker's spectrum. Amplitude is then applied to compensate for differences between warped spectra and spectra. This article presents a fully parametric formulation plus method which bilinear functions used. Introducing this constraint allows...
Statistical parametric synthesizers have achieved very good performance scores during the last years. Nevertheless, as they require use of vocoders to parameterize speech (during training) and reconstruct waveforms synthesis), generated from statistical models lacks some degree naturalness. In previous works we explored usefulness harmonics plus noise model in design a high-quality vocoder. Quite promising results were when this vocoder was integrated into synthesizer. paper, describe recent...
Building a text corpus suitable to be used in corpus-based speech synthesis is time-consuming process that usually requires some human intervention select the desired phonetic content and necessary variety of prosodic contexts. If an emotional text-to-speech (TTS) system desired, complexity generation increases. This paper presents study aiming validate or reject use semantically neutral for recording both (acted) speech. The this kind texts would eliminate need include into corpus. has been...
This paper describes a series of pilot experiments developed to define the electrode setup in order record novel parallel electromyography (EMG)–audio database. The main purpose database is provide data useful for development an EMG-based silent speech interface Spanish laryngectomized speakers. Motivated by scarcity information related studies regarding this important decision-making process, we decided carry out set with multiple recording sessions and different setups. We included types...
A novel representation of the phase information in harmonic speech models is proposed. transformation from instantaneous phases to initial shift differences with respect fundamental frequency provides a clear insight into structure and largely simplifies manipulation this information.
The importance of phase information in the perceptual quality speech signals is studied this paper. Many synthesisers do not use original assuming their contribution almost inaudible. Relative Phase Shift (RPS) representation allows straightforward structure analysis, manipulation and resynthesis, we these features to a comparative evaluation some modifications usually found models. final intention study get an answer question whether phases deserve elaborate models high synthetic speech, or...
A novel algorithm based on classical cepstrum calculation followed by dynamic programming is presented in this paper. The has been evaluated with a 60-minutes database containing 60 speakers and different recording conditions environments. second reference also used. In addition, the performance of four popular PDA algorithms same databases. results prove good described noisy conditions. Furthermore, paper first initiative to perform an evaluation widely used over extensive realistic database.
Voice conversion has been traditionally focused on spectrum. Current systems lack a solid prosody method suitable for different speaking styles. Recently, the unit selection technique applied to transform emotional intonation contours. This paper goes one step beyond: it explores strategies training and configuring cost function in an emotion application. The proposed system, which uses accent groups as basic units performs also phoneme durations intensity, is evaluated by means of carefully...
This paper introduces the Synthetic Speech Detection system developed by Aholab for Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015). The detector is a classifier based on Gaussian Mixture Models that are created using Relative Phase Shift (RPS) transformation phase information. Different strategies have been evaluated: modeling specific attacks information provided ASVspoof 2015 organizers, vocoders possibly used in spoofing signals, data from previous...
Currently, the statistical framework based on Hidden Markov Models (HMMs) plays a relevant role in speech synthesis, while voice conversion systems Gaussian Mixture (GMMs) are almost standard. In both cases, modeling is applied to learn distributions of acoustic vectors extracted from signals, each vector containing suitable parametric representation one frame. The overall performance often limited by accuracy underlying parameterization and reconstruction method. method presented this paper...
Audio segmentation is important as a pre-processing task to improve the performance of many speech technology tasks and, therefore, it has an undoubted research interest. This paper describes database, metric, systems and results for Albayzín-2014 audio campaign. In contrast previous evaluations where was non-overlapping classes, evaluation proposes delimitation presence speech, music and/or noise that can be found simultaneously. The database used in created by fusing different media noises...
The use of continuous monitoring systems to control aspects such as noise pollution has grown in recent years. commercial used date only provide information on levels but do not identify the sources that generate them. identification is an important aspect order apply corrective measures mitigate levels. In this sense, new technological advances like machine listening can enable addition other capabilities sound detection and classification sources. Despite increasing development these...