Hugo Meinedo

ORCID: 0000-0003-0956-6660
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Music and Audio Processing
  • Speech and Audio Processing
  • Speech and dialogue systems
  • Video Analysis and Summarization
  • Natural Language Processing Techniques
  • Multimedia Communication and Technology
  • Advanced Data Compression Techniques
  • Subtitles and Audiovisual Media
  • Phonetics and Phonology Research
  • Topic Modeling
  • Text Readability and Simplification
  • Advanced Image and Video Retrieval Techniques
  • Emotion and Mood Recognition
  • Power Systems and Technologies
  • Translation Studies and Practices
  • Web Data Mining and Analysis
  • Advanced Vision and Imaging
  • Robotics and Automated Systems
  • Advanced Chemical Sensor Technologies
  • Digital Accessibility for Disabilities
  • Aging and Gerontology Research
  • Radio, Podcasts, and Digital Media
  • Intelligent Tutoring Systems and Adaptive Learning
  • Retirement, Disability, and Employment

Microsoft (Portugal)
2015

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
2005-2014

Microsoft (United States)
2014

Institute for Systems Engineering and Computers
2010-2011

University of Lisbon
2003-2009

Instituto Politécnico de Lisboa
2007-2009

Instituto Superior Técnico
2003-2007

In this paper, a novel approach to video temporal decomposition into semantic units, termed scenes, is presented. contrast previous segmentation approaches that employ mostly low-level visual or audiovisual features, we introduce technique jointly exploits and high-level features automatically extracted from the auditory channel. This built upon well-known method of scene transition graph (STG), first by introducing new STG approximation reduced computational cost, then extending unimodal...

10.1109/tcsvt.2011.2138830 article EN IEEE Transactions on Circuits and Systems for Video Technology 2011-04-08

The subtitling of broadcast news programs are starting to become a very interesting application due the technological advances in automatic speech recognition and associated technologies. However, build this kind systems, several necessary both terms components on main blocks integration. In paper, we presenting overall architecture system running daily at RTP (the Portuguese public company). goal is integrate our for programs. global includes recorded direct

10.1109/icassp.2008.4517921 article EN Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing 2008-03-01

This paper presents a description of the INESC-ID Spoken Language Systems Laboratory (L2F) Age and Gender classification system submitted to INTERSPEECH 2010 Paralinguistic Challenge. The L2F are composed respectively by fusion four six individual sub-systems trained with short long term acoustic prosodic features, different strategies (GMM-UBM, MLP SVM) using speech corpora. best results obtained calibration linear logistic regression back-end show an absolute improvement 4.1% on unweighted...

10.21437/interspeech.2010-745 article EN Interspeech 2022 2010-09-26

The paper describes our work on the development of an audio segmentation, classification and clustering system applied to a broadcast news task for European Portuguese language. We developed new algorithm segmentation that is both accurate uses fewer computational resources than other approaches. Our speaker module modified BIC (Bayesian information criterion) which performs substantially better standard symmetric Kullback-Liebler, KL2, much faster full BIC. Finally, we scheme tagging...

10.1109/icassp.2003.1202280 article EN 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003-12-22

10.21437/icslp.2000-423 article EN 4th International Conference on Spoken Language Processing (ICSLP 1996) 2000-10-16

Broadcast news play an important role in our lives providing access to news, information and entertainment. The existence of subtitles is medium for inclusion people with special needs also advantage on noisy populated environments. In this work we will describe evaluate a system subtitling live broadcast RTP (Radio Televisao de Portugal) the Portuguese public company. Developing fully automatic huge breakthrough which results from convergence different research models software developments...

10.21437/interspeech.2008-87 article EN Interspeech 2022 2008-09-22

This paper describes ongoing work on selective dissemination of broadcast news. Our pipeline system includes several modules: audio preprocessing, speech recognition, and topic segmentation indexation. The main goal this is to study the impact earlier errors in last modules. preprocessing quite small recognition module, but significant terms segmentation. On other hand, indexation modules almost negligible. diagnostic these a very important step for improvement prototype media watch described paper.

10.1155/2007/37507 article EN cc-by EURASIP Journal on Advances in Signal Processing 2007-06-21

This article presents a description of the INESC-ID Age and Gender classification systems which were developed for aiding detection child abuse material within scope European project I-DASH. The are composed respectively by fusion four six individual subsystems trained with short- long-term acoustic prosodic features, different strategies, Gaussian Mixture Models-Universal Background Model (GMM-UBM), Multi-Layer Perceptrons (MLP) Support Vector Machines (SVM), over five speech corpus. best...

10.1145/1998384.1998387 article EN ACM Transactions on Speech and Language Processing 2011-08-01

Emotional stress is commonly experienced while speaking in public, producing changes to the various speech productions subsystems, affecting signal predictable ways and being easily conveyed listeners. Speech indicators, however, are typically studied under laboratory settings, allowing little generalization real life settings. To bridge this gap, we propose an interdisciplinary approach assess during public events, based on a platform that records simultaneously annotated with physiological...

10.1145/2494091.2497346 article EN 2013-09-08

The PaeLife project is a European industry-academia collaboration in the framework of Ambient Assisted Living Joint Programme (AAL JP), with goal developing multimodal, multilingual virtual personal life assistant to help senior citizens remain active and socially integrated. Speech one key interaction modalities AALFred, Windows application developed project; can be controlled using speech input four languages: French, Hungarian, Polish Portuguese. This paper briefly presents then focuses...

10.1016/j.procs.2015.09.272 article EN Procedia Computer Science 2015-01-01

This paper describes our work on the development of a low latency stream-based audio pre-processing system for broadcast news using model-based techniques. It performs speech/nonspeech classification, speaker segmentation, clustering, gender and background conditions classification. As way to increase modelling accuracy algorithms make extensive use Artificial Neural Networks (ANN) thus avoiding rough assumptions normally made about signal distribution. Experiments were conducted COST278...

10.21437/interspeech.2005-117 article EN Interspeech 2022 2005-09-04

This paper describes a large scale experiment in which eight research institutions have tested their audio partitioning and labeling algorithms on the same data, multi-lingual database of news broadcasts, using evaluation tools protocols. The experiments provide more insight cross-lingual robustness methods they demonstrated that by further collaborating thedomains speaker change detection clustering it should be possible to achieve technological progress near future.

10.21437/interspeech.2005-68 article EN Interspeech 2022 2005-09-04

In this work the problem of automatic decomposition video into elementary semantic units, known in literature as scenes, is addressed. Two multi-modal scene segmentation techniques are proposed, both building upon Scene Transition Graph (STG). first proposed approaches, speaker diarization results used for introducing a post-processing step to STG construction algorithm, with objective discarding boundaries erroneously identified according visual-only dissimilarity. second approach, and...

10.1145/1631272.1631383 article EN Proceedings of the 30th ACM International Conference on Multimedia 2009-10-19

The last years show a great development of large vocabulary, speaker-independent continuous speech recognition systems and some research in multilingual aspects. To allow that to also be extended the European Portuguese language we decided develop collect database based on amount text. In this new our aim was create corpus equivalent size WSJ0. We selected texts from P UBLICO newspaper, which is characterized by broad coverage matters di erent writing styles. recording population engineering...

10.21437/eurospeech.1997-485 article EN 1997-09-22

This paper describes our recent work on extending the punctuation module of automatic subtitles for Portuguese Broadcast News. The main improvement was achieved by use prosodic information. enabled extension previous which covered only full stops and commas, to cover question marks as well. approach uses lexical, acoustic Our results show that latter is relevant all types punctuation. An analysis also shows what type interrogative better dealt with method, taking into account specificities...

10.21437/interspeech.2010-441 article EN Interspeech 2022 2010-09-26

There are large amounts of information as video and audio not searchable. In a time where Business Intelligence is fundamental for all areas doing this kind analysis only on text sources limiting factor. The use vocabulary speech recognition systems with increasing performance giving rise to different applications. Despite the diversity, these applications share extensive contents transcription. paper we describe results development project between startup company research lab build full...

10.1109/icassp.2011.5946856 article EN 2011-05-01

This paper describes our work on the development of a large vocabulary continuous speech recognition system applied to broadcast news task for European Portuguese language in scope ALERT project. We start by presenting baseline recogniser AUDIMUS, which was originally developed with corpus read newspaper text. is hybrid that uses combination phone probabilities generated several MLPs trained distinct feature sets. The details modifications introduced this system, namely new model, and...

10.1109/asru.2001.1034651 article EN 2005-08-24
Coming Soon ...