- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Speech and dialogue systems
- Natural Language Processing Techniques
- Advanced Data Compression Techniques
- Galician and Iberian cultural studies
- Phonetics and Phonology Research
- Biometric Identification and Security
- Spanish Linguistics and Language Studies
- User Authentication and Security Systems
- Voice and Speech Disorders
- Advanced Adaptive Filtering Techniques
- Hand Gesture Recognition Systems
- Digital Filter Design and Implementation
- Emotion and Mood Recognition
- Topic Modeling
- Video Analysis and Summarization
- Face recognition and analysis
- Mental Health via Writing
- Journalism and Media Studies
- Multi-Agent Systems and Negotiation
- Hearing Impairment and Communication
- Linguistic Studies and Language Acquisition
- Face and Expression Recognition
Universidade de Vigo
2011-2025
Centro Tecnolóxico de Telecomunicacións de Galicia
2015-2017
Universidade de Santiago de Compostela
2016
Vicinay Cadenas (Spain)
2016
Universidad de Sonora
2016
Telefonica Research and Development
2008
The Dialogue
2002
European Telecommunications Standards Institute
2000-2002
IBM (United States)
1991
A new multimodal biometric database designed and acquired within the framework of European BioSecure Network Excellence is presented. It comprised more than 600 individuals simultaneously in three scenarios: 1) over Internet, 2) an office environment with desktop PC, 3) indoor/outdoor environments mobile portable hardware. The scenarios include a common part audio/video data. Also, signature fingerprint data have been both PC Additionally, hand iris were second scenario using PC. Acquisition...
Virtual assistants (VAs) have gained widespread popularity across a wide range of applications, and the integration Large Language Models (LLMs), such as ChatGPT, has opened up new possibilities for developing even more sophisticated VAs. However, this poses ethical issues challenges that must be carefully considered, particularly these systems are increasingly used in public services: transfer personal data, decision-making transparency, potential biases, privacy risks. This paper, an...
Large language models (LLMs) have revolutionized the field of artificial intelligence in both academia and industry, transforming how we communicate, search for information, create content. However, these face knowledge cutoffs costly updates, driving a new ecosystem LLM-based applications that leverage interaction techniques to extend capabilities facilitate updates. As grow more complex, understanding their internal workings becomes increasingly challenging, posing significant issues...
Virtual assistants (VAs) have gained widespread popularity across a wide range of applications, and the integration Large Language Models (LLMs) such as ChatGPT has opened up new possibilities for developing even more sophisticated VAs. However, this poses ethical issues challenges that must be carefully considered, particularly these systems are increasingly used in public services: transfer personal data, decision-making transparency, potential biases, privacy risks. This paper, an...
Despite of the advances in e-learning domain during last decades, there is a lack suitable mechanism to carry out assessment with appropriate measures avoid cheating. Current LMSs do not provide needed features check that intended student taking online exam by himself, or even know if he has spent whole session time front computer. This paper presents web-based application offers biometric authentication based on face-recognition. application, which can be easily integrated currently...
Clinical depression can be considered as a soft biometric trait that help to characterize an individual. This mood disorder involved in forensic psychological assessment, due its relevance different legal issues. The automatic detection of depressed speech has been object research the last years, resulting algorithmic approaches and acoustic features. Due use algorithms, databases performance measures, deciding which ones are more suitable for this task is difficult. In work, features was...
Beside the optimization of biometric error rates overall security system performance in respect to intentional attacks plays an important role for enabled authentication schemes. As traditionally most user schemes are knowledge and/or possession based, firstly this paper we present a methodology analysis Internet-based systems by enhancing known methodologies such as CERT attack-taxonomy with more detailed view on OSI-Model. Secondly proof concept, guidelines extracted from strictly applied...
This paper describes a large scale experiment in which eight research institutions have tested their audio partitioning and labeling algorithms on the same data, multi-lingual database of news broadcasts, using evaluation tools protocols. The experiments provide more insight cross-lingual robustness methods they demonstrated that by further collaborating thedomains speaker change detection clustering it should be possible to achieve technological progress near future.
Spoken term detection (STD) aims at retrieving data from a speech repository given textual representation of the search term. Nowadays, it is receiving much interest due to large volume multimedia information. STD differs automatic recognition (ASR) in that ASR interested all terms/words appear data, whereas focuses on selected list terms must be detected within data. This paper presents systems submitted ALBAYZIN 2014 evaluation, held as part evaluation campaign context IberSPEECH...
This paper addresses the challenge of integrating low-resource languages into multilingual automatic speech recognition (ASR) systems. We introduce a novel application weighted cross-entropy, typically used for unbalanced datasets, to facilitate integration pre-trained ASR models within context continual learning. fine-tune Whisper model on five high-resource and one language, employing language-weighted dynamic cross-entropy data augmentation. The results show remarkable 6.69% word error...
Soft biometrics comprises the biological traits that are not sufficient for person authentication but can help to narrow search space. Evidence of mental health state be considered as a soft biometric, it provides valuable information about identity an individual. Different approaches have been used automatic classification speech in "depressed" or "non-depressed", differences algorithms, features, databases and performance measures make difficult draw conclusions which features techniques...
Cross-lingual query-by-example spoken term detection (QbE STD) has caught the attention of speech researchers, as it makes possible to develop systems for low-resource languages, in which available amount labelled data training automatic recognition approaches prohibitive. The use phonetic posteriorgrams representation combined with dynamic time warping search is a widely used approach this task, but little been focused suitability set units represent information different language. This...
In this paper we present a fast method to implement language model (LM) look-ahead algorithm in Viterbi-based, single-lexical-tree speech recognizer. We have used three different mechanisms speed up the calculation: cache memory attached each node or network, pre-calculation of probabilities active contexts, and an organization LM using perfect hash. These enhancements make it possible use full trigram compute with better overall results, both terms recognition rate computation time, than...
The paper deals with the task of audio segmentation in TV broadcast news. A multimedia approach for this purpose, by means and video processing, is proposed. Thus, system composed two differentiated parts: one analyzes stream, based on well-known Bayesian information criterion (BIC), whereas other part extracts useful from stream to improve performance BIC. An investigation parameters involved BIC formulation also accomplished, order achieve best results possible our experimental framework:...