- Video Analysis and Summarization
- Music and Audio Processing
- Speech Recognition and Synthesis
- Natural Language Processing Techniques
- Speech and dialogue systems
- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Diverse Musicological Studies
- Multimodal Machine Learning Applications
- Digital and Traditional Archives Management
- AI in Service Interactions
- Social Robot Interaction and HRI
- Topic Modeling
- Multimedia Communication and Technology
- Digital Humanities and Scholarship
- Speech and Audio Processing
- Radio, Podcasts, and Digital Media
- Semantic Web and Ontologies
- Hate Speech and Cyberbullying Detection
- Music Technology and Sound Studies
- Advanced Data Compression Techniques
- Scientific Computing and Data Management
- Bullying, Victimization, and Aggression
- Advanced Text Analysis Techniques
- Data Visualization and Analytics
University of Twente
2013-2024
Netherlands Institute for Sound and Vision
2009-2022
Human Media
2001-2017
Delft University of Technology
2009
Fraunhofer Institute for Intelligent Analysis and Information Systems
2009
Netherlands Organisation for Applied Scientific Research
2009
Radboud University Nijmegen
2009
University of Edinburgh
2005
University of Sheffield
2005
Brno University of Technology
2005
Automatically generated tags and geotags hold great promise to improve access video collections online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition data set released. For each task, a reference algorithm is presented that was used within comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes collection of Dutch television with subject...
Work on expressive speech synthesis has long focused the expression of basic emotions. In recent years, however, interest in other styles been increasing. The research presented this paper aims at generation a storytelling speaking style, which is suitable for applications and more general, aimed children. Based an analysis human storytellers' speech, we designed implemented set prosodic rules converting "neutral" as produced by text-to-speech system, into speech. An evaluation our system...
Searching for relevant webpages and following hyperlinks to related content is a widely accepted effective approach information seeking on the textual web. Existing work multimedia retrieval has focused search individual items or linking without specific attention results. We describe our research exploring integrated multimodal hyperlinking data. Our investigation based MediaEval 2012 Search Hyperlinking task. This includes known-item task using Blip10000 internet video collection, where...
In this technical demonstration, we showcase a multimedia search engine that facilitates semantic access to archival rock n' roll concert video. The key novelty is the crowdsourcing mechanism, which relies on online users improve, extend, and share, automatically detected results in video fragments using an advanced timeline-based player. user-feedback serves as valuable input further improve automated retrieval results, such concepts transcribed interviews. has been operational harvest...
The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area.In paper we explore the use various meeting corpora for purpose recognition.In particular investigate similarity these resources and how efficiently them construction a transcription system.The analysis shows distinctive features each resource.However benefit pooling data hence seems sufficient speak generic "conference...
This paper addresses compound splitting for Dutch in the context of broadcast news transcription. Language models were created using original text versions and that decomposed a data-driven algorithm. model performances compared terms out-of- vocabulary rates word error real-world transcription task. It was concluded does improve ASR performance. Best results obtained when frequent compounds not decomposed.
In this paper we discuss the speech activity detection system that used for detecting regions in Dutch TRECVID video collection.The is designed to filter non-speech like music or sound effects out of signal without use predefined models.Because trains its models on-line, it robust handling out-ofdomain data.The error rate on an out-of-domain test set, recordings English conference meetings, was 4.4%.The overall twelve randomly selected five minute fragments 11.5%.
The MediaEval Multimedia Benchmark leveraged community cooperation and crowdsourcing to develop a large Internet video dataset for its Genre Tagging Rich Speech Retrieval tasks.