- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Natural Language Processing Techniques
- Speech and dialogue systems
- Phonetics and Phonology Research
- Topic Modeling
- Voice and Speech Disorders
- Neurobiology of Language and Bilingualism
- Indoor and Outdoor Localization Technologies
- Advanced Adaptive Filtering Techniques
- Text Readability and Simplification
- Emotion and Mood Recognition
- Occupational Health and Safety Research
- Adversarial Robustness in Machine Learning
- Video Analysis and Summarization
- Advanced Data Compression Techniques
- Hate Speech and Cyberbullying Detection
- Context-Aware Activity Recognition Systems
- Video Surveillance and Tracking Methods
- Language Development and Disorders
- Interpreting and Communication in Healthcare
- Sentiment Analysis and Opinion Mining
- Infant Health and Development
- Acoustic Wave Phenomena Research
University of Lisbon
2016-2025
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
2016-2025
Institute for Systems Engineering and Computers
2009-2023
Instituto Politécnico de Lisboa
2015-2023
Instituto Superior de Tecnologias Avançadas
2014-2021
Instituto Superior Técnico
2020
Universidade de Vigo
2018-2020
University of Edinburgh
2018-2019
Universidade Federal do Pampa
2017
Fondazione Bruno Kessler
2015
Speech recordings are a rich source of personal, sensitive data that can be used to support plethora diverse applications, from health profiling biometric recognition. It is therefore essential speech adequately protected so they cannot misused. Such protection, in the form privacy-preserving technologies, required ensure that: (i) profiles given individual (e.g., across different service operators) unlinkable; (ii) leaked, encrypted information irreversible, and (iii) references renewable....
An attacker may use a variety of techniques to fool an automatic speaker verification system into accepting them as genuine user. Anti-spoofing methods meanwhile aim make the robust against such attacks. The ASVspoof 2017 Challenge focused specifically on replay attacks, with intention measuring limits attack detection well developing countermeasures them. In this work, we propose our attacks - Attentive Filtering Network, which is composed attention-based filtering mechanism that enhances...
Audio event detection is one of the tasks European project VIDIVIDEO. This paper focuses on non-speech events, and as such only searches for events in audio segments that have been previously classified non-speech. Preliminary experiments with a small corpus sound effects shown potential this type training purposes. describes our SVM HMM-based classifiers, using 290-hour effects. Although we built detectors 15 semantic concepts so far, method seems easily portable to other concepts. The...
This paper evaluates the performance of twelve primary systems submitted to evaluation on speaker verification in context a mobile environment using MOBIO database. The provides challenging and realistic test-bed for current state-of-the-art techniques. Results terms equal error rate (EER), half total (HTER) detection trade-off (DET) confirm that best performing are based variability modeling, fusion several sub-systems. Nevertheless, good old UBM-GMM still competitive. results also show use...
Burnout é um distúrbio emocional resultante de situações trabalho desgastante, que demandam muita competitividade ou responsabilidade, como os presentes no dos Guardas Civis Municipais, objetos da investigação deste estudo. Participaram pesquisa 111 Municipais Petrópolis. Para esta investigação, foram usados seguintes instrumentos: Questionário sociodemográfico, Oldenburg Inventory (OLBI)e sobre a percepção fatores associados ao desgaste físico e trabalho. Foi identifica prevalência burnout,...
The combination of several heterogeneous systems is known to provide remarkable performance improvements in verification and detection tasks. In Spoken Term Detection (STD), two important issues arise: (1) how define a common set detected candidates, (2) combine system scores produce single score per candidate. this paper, discriminative calibration/fusion approach commonly applied speaker language recognition adopted for STD. Under approach, we first propose heuristics hypothesize that do...
The outsourcing of machine learning classification and data mining tasks can be an effective solution for those parties that need services, but lack the appropriate resources, knowledge and/or tools to carry them out, in their own premises. This solution, however, raises major privacy concerns, particular, when irrevocable biometric such as speech is involved. In this work, we focus on development privacy-preserving schemes a emotion recognition task, proof concept could extended other...
This paper describes a multi-modal approach for the automatic detection of Alzheimer's disease proposed in context INESC-ID Human Language Technology Laboratory participation ADReSS 2020 challenge.Our classification framework takes advantage both acoustic and textual feature embeddings, which are extracted independently later combined.Speech signals encoded into features using DNN speaker embeddings from pre-trained models.For input, contextual embedding vectors first an English Bert model...
Speaker identification models are vulnerable to carefully designed adversarial perturbations of their input signals that induce misclassification. In this work, we propose a white-box steganography-inspired attack generates imperceptible against speaker model. Our approach, FoolHD, uses Gated Convolutional Autoencoder operates in the DCT domain and is trained with multi-objective loss function, generate conceal perturbation within original audio files. addition hindering performance,...
This paper describes our work on audio event detection, one of tasks in the European project VIDIVIDEO. Preliminary experiments with a small corpus sound effects have shown potential this type for training purposes. SVM classifiers, and different features, using 290-hour effects, which allowed us to build detectors almost 50 semantic concepts. Although performance these development set is quite good (achieving an average F-measure 0.87), preliminary documentaries films showed that task much...
We propose a method for zero-resource domain adaptation of DNN acoustic models, use in low-resource situations where the only in-language training data available may be poorly matched to intended target domain. Our uses multi-lingual model which several layers are shared between languages. This architecture enables transforms learned one well-resourced language applied an entirely different low- resource language. First, develop technique we English as and take Spanish mimic Experiments...
The high variability in acoustic, pronunciation, and linguistic characteristics of children's speech makes automatic recognition (ASR) a complex task. Training dedicated ASR model from scratch for children remains challenging, mainly due to the limited availability data. To tackle this limitation, common strategy involves fine-tuning pre-trained model. However, approach faces challenges diversity speakers data scarcity, especially when dealing with large models like Conformer. In study, we...
Domestic environments are particularly challenging for distant speech recognition: reverberation, background noise and interfering sources, as well the propagation of acoustic events across adjacent rooms, critically degrade performance standard processing algorithms. In this application scenario, a crucial task is detection localization generated by users within various rooms. A specific challenge multi-room inter-room interference that negatively affects activity detectors. paper, we...
Obstructive sleep apnea (OSA) is a prevalent disorder, responsible for decrease of people's quality life, and significant morbidity mortality associated with hypertension cardiovascular diseases. OSA caused by anatomical functional alterations in the upper airways, thus we hypothesize that speech properties patients are altered, making it possible to detect through voice analysis. To address this hypothesis, collected recordings from 25 subjects 20 controls, designed feature set, compared...
Clinical literature provides convincing evidence that language deficits in Alzheimer's disease (AD) allow for distinguishing patients with dementia from healthy subjects. Currently, computational approaches have widely investigated lexicosemantic aspects of discourse production, while pragmatic like cohesion and coherence, are still mostly unexplored. In this article, we aim at providing a more comprehensive characterization abilities the automatic identification AD narrative description...