- Speech Recognition and Synthesis
- Music and Audio Processing
- Speech and Audio Processing
- Speech and dialogue systems
- Video Analysis and Summarization
- Natural Language Processing Techniques
- Multimedia Communication and Technology
- Advanced Data Compression Techniques
- Subtitles and Audiovisual Media
- Phonetics and Phonology Research
- Topic Modeling
- Text Readability and Simplification
- Advanced Image and Video Retrieval Techniques
- Emotion and Mood Recognition
- Power Systems and Technologies
- Translation Studies and Practices
- Web Data Mining and Analysis
- Advanced Vision and Imaging
- Robotics and Automated Systems
- Advanced Chemical Sensor Technologies
- Digital Accessibility for Disabilities
- Aging and Gerontology Research
- Radio, Podcasts, and Digital Media
- Intelligent Tutoring Systems and Adaptive Learning
- Retirement, Disability, and Employment
Microsoft (Portugal)
2015
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
2005-2014
Microsoft (United States)
2014
Institute for Systems Engineering and Computers
2010-2011
University of Lisbon
2003-2009
Instituto Politécnico de Lisboa
2007-2009
Instituto Superior Técnico
2003-2007
In this paper, a novel approach to video temporal decomposition into semantic units, termed scenes, is presented. contrast previous segmentation approaches that employ mostly low-level visual or audiovisual features, we introduce technique jointly exploits and high-level features automatically extracted from the auditory channel. This built upon well-known method of scene transition graph (STG), first by introducing new STG approximation reduced computational cost, then extending unimodal...
The subtitling of broadcast news programs are starting to become a very interesting application due the technological advances in automatic speech recognition and associated technologies. However, build this kind systems, several necessary both terms components on main blocks integration. In paper, we presenting overall architecture system running daily at RTP (the Portuguese public company). goal is integrate our for programs. global includes recorded direct
This paper presents a description of the INESC-ID Spoken Language Systems Laboratory (L2F) Age and Gender classification system submitted to INTERSPEECH 2010 Paralinguistic Challenge. The L2F are composed respectively by fusion four six individual sub-systems trained with short long term acoustic prosodic features, different strategies (GMM-UBM, MLP SVM) using speech corpora. best results obtained calibration linear logistic regression back-end show an absolute improvement 4.1% on unweighted...
The paper describes our work on the development of an audio segmentation, classification and clustering system applied to a broadcast news task for European Portuguese language. We developed new algorithm segmentation that is both accurate uses fewer computational resources than other approaches. Our speaker module modified BIC (Bayesian information criterion) which performs substantially better standard symmetric Kullback-Liebler, KL2, much faster full BIC. Finally, we scheme tagging...
Broadcast news play an important role in our lives providing access to news, information and entertainment. The existence of subtitles is medium for inclusion people with special needs also advantage on noisy populated environments. In this work we will describe evaluate a system subtitling live broadcast RTP (Radio Televisao de Portugal) the Portuguese public company. Developing fully automatic huge breakthrough which results from convergence different research models software developments...
This paper describes ongoing work on selective dissemination of broadcast news. Our pipeline system includes several modules: audio preprocessing, speech recognition, and topic segmentation indexation. The main goal this is to study the impact earlier errors in last modules. preprocessing quite small recognition module, but significant terms segmentation. On other hand, indexation modules almost negligible. diagnostic these a very important step for improvement prototype media watch described paper.
This article presents a description of the INESC-ID Age and Gender classification systems which were developed for aiding detection child abuse material within scope European project I-DASH. The are composed respectively by fusion four six individual subsystems trained with short- long-term acoustic prosodic features, different strategies, Gaussian Mixture Models-Universal Background Model (GMM-UBM), Multi-Layer Perceptrons (MLP) Support Vector Machines (SVM), over five speech corpus. best...
Emotional stress is commonly experienced while speaking in public, producing changes to the various speech productions subsystems, affecting signal predictable ways and being easily conveyed listeners. Speech indicators, however, are typically studied under laboratory settings, allowing little generalization real life settings. To bridge this gap, we propose an interdisciplinary approach assess during public events, based on a platform that records simultaneously annotated with physiological...
The PaeLife project is a European industry-academia collaboration in the framework of Ambient Assisted Living Joint Programme (AAL JP), with goal developing multimodal, multilingual virtual personal life assistant to help senior citizens remain active and socially integrated. Speech one key interaction modalities AALFred, Windows application developed project; can be controlled using speech input four languages: French, Hungarian, Polish Portuguese. This paper briefly presents then focuses...
This paper describes our work on the development of a low latency stream-based audio pre-processing system for broadcast news using model-based techniques. It performs speech/nonspeech classification, speaker segmentation, clustering, gender and background conditions classification. As way to increase modelling accuracy algorithms make extensive use Artificial Neural Networks (ANN) thus avoiding rough assumptions normally made about signal distribution. Experiments were conducted COST278...
This paper describes a large scale experiment in which eight research institutions have tested their audio partitioning and labeling algorithms on the same data, multi-lingual database of news broadcasts, using evaluation tools protocols. The experiments provide more insight cross-lingual robustness methods they demonstrated that by further collaborating thedomains speaker change detection clustering it should be possible to achieve technological progress near future.
In this work the problem of automatic decomposition video into elementary semantic units, known in literature as scenes, is addressed. Two multi-modal scene segmentation techniques are proposed, both building upon Scene Transition Graph (STG). first proposed approaches, speaker diarization results used for introducing a post-processing step to STG construction algorithm, with objective discarding boundaries erroneously identified according visual-only dissimilarity. second approach, and...
The last years show a great development of large vocabulary, speaker-independent continuous speech recognition systems and some research in multilingual aspects. To allow that to also be extended the European Portuguese language we decided develop collect database based on amount text. In this new our aim was create corpus equivalent size WSJ0. We selected texts from P UBLICO newspaper, which is characterized by broad coverage matters di erent writing styles. recording population engineering...
This paper describes our recent work on extending the punctuation module of automatic subtitles for Portuguese Broadcast News. The main improvement was achieved by use prosodic information. enabled extension previous which covered only full stops and commas, to cover question marks as well. approach uses lexical, acoustic Our results show that latter is relevant all types punctuation. An analysis also shows what type interrogative better dealt with method, taking into account specificities...
There are large amounts of information as video and audio not searchable. In a time where Business Intelligence is fundamental for all areas doing this kind analysis only on text sources limiting factor. The use vocabulary speech recognition systems with increasing performance giving rise to different applications. Despite the diversity, these applications share extensive contents transcription. paper we describe results development project between startup company research lab build full...
This paper describes our work on the development of a large vocabulary continuous speech recognition system applied to broadcast news task for European Portuguese language in scope ALERT project. We start by presenting baseline recogniser AUDIMUS, which was originally developed with corpus read newspaper text. is hybrid that uses combination phone probabilities generated several MLPs trained distinct feature sets. The details modifications introduced this system, namely new model, and...