- Advanced Image and Video Retrieval Techniques
- Video Analysis and Summarization
- Image Retrieval and Classification Techniques
- Multimodal Machine Learning Applications
- Music and Audio Processing
- Domain Adaptation and Few-Shot Learning
- Topic Modeling
- Advanced Vision and Imaging
- Video Surveillance and Tracking Methods
- Multimedia Communication and Technology
- Data Visualization and Analytics
- Natural Language Processing Techniques
- Digital Media Forensic Detection
- Advanced Graph Neural Networks
- Handwritten Text Recognition Techniques
- Image and Object Detection Techniques
- Human Pose and Action Recognition
- Semantic Web and Ontologies
- Medical Image Segmentation Techniques
- Complex Network Analysis Techniques
- COVID-19 diagnosis using AI
- Anomaly Detection Techniques and Applications
- Visual Attention and Saliency Detection
- Sentiment Analysis and Opinion Mining
- Face recognition and analysis
University of Amsterdam
2015-2024
Amsterdam University of the Arts
2014-2023
Centrum Wiskunde & Informatica
2000-2023
Netherlands Forensic Institute
2022
American Academy of Forensic Sciences
2020
Amsterdam University of Applied Sciences
2006-2013
Eindhoven University of Technology
2010
Vrije Universiteit Amsterdam
2005
Erasmus University Rotterdam
1988
Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions retrieval: patterns use, types pictures, role semantics, and sensory gap. Subsequent sections discuss computational steps for retrieval systems. Step one is processing sorted by color, texture, local geometry. Features are discussed next, by: accumulative global features, salient points, object shape signs, structural combinations thereof. Similarity pictures objects...
Semantic analysis of multimodal video aims to index segments interest at a conceptual level. In reaching this goal, it requires an several information streams. At some point in the these streams need be fused. paper, we consider two classes fusion schemes, namely early and late fusion. The former fuses modalities feature space, latter semantic space. We show by experiment on 184 hours broadcast data for 20 concepts, that tends give slightly better performance most concepts. However, those...
We introduce the challenge problem for generic video indexing to gain insight in intermediate steps that affect performance of multimedia analysis methods, while at same time fostering repeatability experiments. To arrive a problem, we provide general scheme systematic examination automated concept detection by decomposing into 2 unimodal experiments, multimodal and 1 combined experiment. For each experiment, evaluate on 85 hours international broadcast news data, from TRECVID 2005/2006...
Social image analysis and retrieval is important for helping people organize access the increasing amount of user tagged multimedia. Since tagging known to be uncontrolled, ambiguous, overly personalized, a fundamental problem how interpret relevance user-contributed tag with respect visual content describing. Intuitively, if different persons label visually similar images using same tags, these tags are likely reflect objective aspects content. Starting from this intuition, we propose in...
The watershed algorithm from mathematical morphology is powerful for segmentation. However, it does not allow incorporation of a priori information as segmentation methods that are based on energy minimization. In particular, there no control the smoothness result. this paper, we show how to represent an minimization problem using distance-based definition line. A considerations about can then be imposed by adding contour length function. This leads new method called watersnakes, integrating...
In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in analysis has reached the point where over 100 detectors can be learned a generic fashion, albeit with mixed performance. Such set of is very small still compared to ontologies aiming capture full vocabulary user has. We aim throw bridge between two fields by building multimedia thesaurus, i.e., machine that enriched semantic descriptions and structure obtained from WordNet. Given...
Social image retrieval is important for exploiting the increasing amounts of amateur-tagged multimedia such as Flickr images. Since amateur tagging known to be uncontrolled, ambiguous, and personalized, a fundamental problem how reliably interpret relevance tag with respect visual content it describing. Intuitively, if different persons label similar images using same tags, these tags are likely reflect objective aspects content. Starting from this intuition, we propose novel algorithm that...
Brand-related user posts on social networks are growing at a staggering rate, where users express their opinions about brands by sharing multimodal posts. However, while some become popular, others ignored. In this paper, we present an approach for identifying what aspects of determine popularity. We hypothesize that brand-related may be popular due to several cues related factual information, sentiment, vividness and entertainment parameters the brand. call ensemble engagement parameters....
Typical multi-task learning (MTL) methods rely on architectural adjustments and a large trainable parameter set to jointly optimize over several tasks. However, when the number of tasks increases so do complexity resource requirements. In this paper, we introduce method which applies conditional feature-wise transformation convolutional activations that enables model successfully perform To distinguish from regular MTL, Many Task Learning (MaTL) as special case MTL where more than 20 are...
In this paper, we discuss the initial attempts at boosting understanding human language based on deep-learning models with quantum computing. We successfully train a quantum-enhanced Long Short-Term Memory network to perform parts-of-speech tagging task via numerical simulations. Moreover, Transformer is proposed sentiment analysis existing dataset.
This paper presents the semantic pathfinder architecture for generic indexing of multimedia archives. The extracts concepts from video by exploring different paths through three consecutive analysis steps, which we derive observation that produced is result an authoring-driven process. We exploit this authoring metaphor machine-driven understanding. starts with content step. In step, follow a data-driven approach semantics. style step second Here, tackle problem viewing perspective...
Baselines are the starting point of any quantitative multimedia research, and benchmarks essential for pushing those baselines further. In this article, we present artistic domain with a new benchmark dataset featuring over 2 million images rich structured metadata dubbed OmniArt. OmniArt contains annotations dozens attribute types features semantic context information through concepts, IconClass labels, color information, (limited) object-level bounding boxes. For our establish baseline...