- Speech Recognition and Synthesis
- Music and Audio Processing
- Speech and Audio Processing
- Natural Language Processing Techniques
- Speech and dialogue systems
- Advanced Data Compression Techniques
- Neuropeptides and Animal Physiology
- Phonetics and Phonology Research
- Machine Learning and Algorithms
- Subtitles and Audiovisual Media
- Topic Modeling
- Translation Studies and Practices
- Receptor Mechanisms and Signaling
- Peptidase Inhibition and Analysis
- Algorithms and Data Compression
- Radio, Podcasts, and Digital Media
- Experimental Learning in Engineering
- Language, Linguistics, Cultural Analysis
- Interpreting and Communication in Healthcare
- Engineering Education and Technology
- Neuroscience of respiration and sleep
- Data Mining Algorithms and Applications
- Protein Hydrolysis and Bioactive Peptides
- semigroups and automata theory
- Adenosine and Purinergic Signaling
Hospital Riotinto
2025
University of the Basque Country
2009-2024
Software (Spain)
2015
Universidad Politécnica de Madrid
2013
Universitat Politècnica de València
1995
In the last years, task of Query-by-Example Spoken Term Detection (QbE-STD), which aims to find occurrences a spoken query in set audio documents, has gained interest research community for its versatility settings where untranscribed, multilingual and acoustically unconstrained resources, or resources low-resource languages, must be searched. This paper describes reports experimental results QbE-STD system that achieved best performance recent Web Search (SWS) evaluation, held as part...
This paper evaluates the performance of twelve primary systems submitted to evaluation on speaker verification in context a mobile environment using MOBIO database. The provides challenging and realistic test-bed for current state-of-the-art techniques. Results terms equal error rate (EER), half total (HTER) detection trade-off (DET) confirm that best performing are based variability modeling, fusion several sub-systems. Nevertheless, good old UBM-GMM still competitive. results also show use...
Spinal muscular atrophy (SMA) and severe T- and/or B-cell lymphopenias (STBCL) in the form of combined immunodeficiencies (SCID) or X-linked agammaglobulinemia (XLA) are rare but potentially fatal pathologies. In January 2021, we initiated first pilot study Spain to evaluate efficacy a very early detection technique for SMA SCID. RT–PCR was performed on prospectively collected dried blood spots (DBSs) from newborns Western Andalusia (Spain). Internal external controls (SCID, XLA SMA) were...
This paper presents an alternative feature set to the traditional MFCC-SDC used in acoustic approaches Spoken Language Recognition: log-likelihood ratios of phone posterior probabilities, hereafter Phone Log-Likelihood Ratios (PLLR), produced by a recognizer. In this work, iVector system trained on features (plus dynamic coefficients) is evaluated and compared (1) (trained set) (2) phonotactic (Phone-lattice-SVM) system, using two different benchmarks: NIST 2007 2009 LRE datasets. systems...
The combination of several heterogeneous systems is known to provide remarkable performance improvements in verification and detection tasks. In Spoken Term Detection (STD), two important issues arise: (1) how define a common set detected candidates, (2) combine system scores produce single score per candidate. this paper, discriminative calibration/fusion approach commonly applied speaker language recognition adopted for STD. Under approach, we first propose heuristics hypothesize that do...
The Albayzin 2008 Language Recognition Evaluation was held from May to October 2008, and their results presented discussed among the participating teams at 5th Biennial Workshop on Speech Technology [1], organized by Spanish Network Technologies [2] in November 2008.In this paper, we present (for first time) a full description of LRE analyze discuss recognition results.The evaluation designed according test procedures, protocols performance measures used NIST 2007 LRE.The KALAKA database...
In the framework of a contract with Basque Parliament for subtitling videos bilingual plenary sessions, which basically consisted aligning very long (around 3 hours long) audio tracks syntactically correct but acoustically inaccurate text transcriptions (since all disfluencies, mistakes, etc. were edited), simple and efficient procedure (avoiding need language nor lexical models, was key because mix languages) developed as first approach, before trying more complex schemes found in...
State of the art language recognition systems usually add a backend prior to linear fusion subsystems scores. The plays dual role. When set languages for which models have been trained does not match target languages, maps available scores space languages. On other hand, serves as precalibration stage that adapts In this work, well known backends (Generative Gaussian Backend, Discriminative Backend and Logistic Regression Backend) newer proposals (Fully Bayesian Mixture are analyzed...
The Albayzin 2012 Language Recognition Evaluation (LRE), carried out from June to October 2012, was the third effort made by Spanish/Portuguese community for benchmarking lan- guage recognition technology. As in previous 2008 and 2010 evaluations, task consisted on deciding whether or not a target language spoken test utterance. pri- mary condition involved 6 languages which there plenty of training data: English, Portuguese four offi- cial Spain (Basque, Catalan, Galician Span- ish). A new...
In a previous work, we introduced the use of log-likelihood ratios phone posterior probabilities, called Phone LogLikelihood Ratios (PLLR) as features for language recognition under an iVector-based approach, yielding high performance and promising results. However, dimensionality PLLR feature vectors (with regard to MFCC/SDC features) results in comparatively higher computational costs. this several supervised unsupervised reduction techniques are studied, based on either fusions or...
The so called Phone Log-Likelihood Ratio (PLLR) features have been recently introduced as a novel and effective way of retrieving acoustic-phonetic information in spoken language speaker recognition systems. In this letter, an in-depth insight into the PLLR feature space is provided multidimensional distribution these analyzed system. study reveals that are confined subspace strongly bounds distributions. To enhance retrieved by system, projected hyper-plane provides more suitable...
Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much due to high volume information stored in audio or audiovisual format. QbE STD differs automatic recognition (ASR) and keyword spotting (KWS)/spoken detection (STD) since ASR is interested all terms/words that appear signal KWS/STD relies on textual transcription search retrieve data. This paper...
The synchronization of text transcripts with audio tracks is typically solved by forced alignment at the phonetic level. However, when dealing either very long or acoustically inaccurate transcripts, more complex methods are needed, usually based on heavy and costly ASR systems. In a previous work, we showed that simple lightweight method could be effectively applied, free decoding speech signal reference sequences, allowing transfer timestamps from former to latter. This has yielded...
Most common approaches to phonotactic language recognition deal with several independent phone decodings. These decodings are processed and scored in a fully uncoupled way, their time alignment (and the information that may be extracted from it) being completely lost. Recently, we have presented two new which take into account information, by considering time-synchronous cross-decoder co-occurrences. Experiments on 2007 NIST LRE database demonstrated using co-occurrence statistics could...
The development of speech technology requires large amounts data to estimate the underlying models. Even when relying on multilingual pre-trained models, some amount task-specific target language is needed fine-tune those models and obtain competitive performance. In this paper, we present a bilingual Basque–Spanish dataset extracted from parliamentary sessions. designed develop evaluate automatic recognition (ASR) systems but can be easily repurposed for other speech-processing tasks (such...
The so called Phone Log-Likelihood Ratio (PLLR) features, computed on phone posterior probabilities provided by phonetic decoders, convey acoustic-phonetic information in a sequence of frame-level vectors. Thus, PLLRs can be easily plugged into traditional acoustic systems just replacing MFCCs, PLPs or whatever other representation. PLLR features were used under an iVector-PLDA approach our submission to the NIST 2012 Speaker Recognition Evaluation (SRE). In this work, we present report...
Subtitling of video contents offered in the web by Spanish administration agencies is required law for allowing people with hearing impairments to follow them.The automatic bilingual subtitling system described this paper has been applied on plenary sessions videos that Basque Parliament posts its (http://www.parlamentovasco.euskolegebiltzarra.org/), and running from September 2010.A specific characteristic use a simple phonetic decoder based joint selection phone models, since it not...
Best language recognition performance is commonly obtained by fusing the scores of several heterogeneous systems. Regardless fusion approach, it assumed that different systems may contribute complementary information, either because they are developed on datasets, or use features modeling approaches. Most authors apply as a final resource for improving based an existing set Though relative gains decrease larger sets considered, best usually attained all available systems, which lead to high...