- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Video Analysis and Summarization
- Multimodal Machine Learning Applications
- Advanced Graph Neural Networks
- Topic Modeling
- Complex Network Analysis Techniques
- Human Mobility and Location-Based Analysis
- Video Surveillance and Tracking Methods
- Automated Road and Building Extraction
- Opinion Dynamics and Social Influence
- Aesthetic Perception and Analysis
- Data-Driven Disease Surveillance
- Music and Audio Processing
- Misinformation and Its Impacts
- Data Stream Mining Techniques
- Visual Attention and Saliency Detection
- Advanced Data Compression Techniques
- Remote-Sensing Image Classification
- Machine Learning and Algorithms
- Anomaly Detection Techniques and Applications
- Natural Language Processing Techniques
- Sentiment Analysis and Opinion Mining
- Social Media and Politics
- Neural Networks and Applications
University of Amsterdam
2015-2024
Amsterdam University of the Arts
2014-2024
Delft University of Technology
2009-2021
University of Belgrade
2006-2007
Brand-related user posts on social networks are growing at a staggering rate, where users express their opinions about brands by sharing multimodal posts. However, while some become popular, others ignored. In this paper, we present an approach for identifying what aspects of determine popularity. We hypothesize that brand-related may be popular due to several cues related factual information, sentiment, vividness and entertainment parameters the brand. call ensemble engagement parameters....
Automatically generated tags and geotags hold great promise to improve access video collections online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition data set released. For each task, a reference algorithm is presented that was used within comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes collection of Dutch television with subject...
In this paper, we present a novel approach for automatic visual summarization of geographic area that exploits user-contributed images and related explicit implicit metadata collected from popular content-sharing websites. By means approach, search limited number representative but diverse to represent the within certain radius around specific location. Our is based on random walk with restarts over graph models relations between images, features extracted them, associated text, as well...
Graphs are the most ubiquitous form of structured data representation used in machine learning. They model, however, only pairwise relations between nodes and not designed for encoding higher-order found many real-world datasets. To model such complex relations, hypergraphs have proven to be a natural representation. Learning node representations hypergraph is more than graph as it involves information propagation at two levels: within every hyperedge across hyperedges. Most current...
Image search stands as a pivotal task in multimedia and computer vision, finding applications across diverse domains, ranging from internet to medical diagnostics. Conventional image systems operate by accepting textual or visual queries, retrieving the top-relevant candidate results database. However, prevalent methods often rely on single-turn procedures, introducing potential inaccuracies limited recall. These also face challenges, such vocabulary mismatch semantic gap, constraining their...
Large Language Models (LLMs) have shown remarkable performance across various tasks, but the escalating demands on computational resources pose significant challenges, particularly in extensive utilization of full fine-tuning for downstream tasks. To address this, parameter-efficient (PEFT) methods been developed, they often underperform compared to and struggle with memory efficiency. In this work, we introduce Gradient Weight-Normalized Low-Rank Projection (GradNormLoRP), a novel approach...
In this paper we propose a novel approach to selecting images suitable for inclusion in the visual summaries. The is grounded insights about how people summarize image collections. We utilize Amazon Mechanical Turk crowdsourcing platform obtain large number of manually created summaries as well information criteria summary. Based on these large-scale user tests, an automatic selection approach, which jointly utilizes analysis content, context, popularity, aesthetic appeal sentiment derived...
The past decade has seen a rapid expansion of personal and interpersonal multimedia collections. These collections offer wealth information about individuals, including their interests, health, significant life events. While automated techniques can assist in structuring organizing these collections, they often have limitations helping users effectively navigate find relevant items within such large datasets. Lifelog Search Challenge (LSC) provides valuable benchmark for evaluating...
In this paper, we propose City Melange, an interactive and multimodal content-based venue explorer. Our framework matches the interacting user to users of social media platforms exhibiting similar taste. The data collection integrates location-based networks such as Foursquare with general multimedia sharing Flickr or Picasa. interacts a set images thus implicitly underlying semantics. semantic information is captured through convolutional deep net features in visual domain latent topics...
This paper presents Blackthorn, an efficient interactive multimodal learning approach facilitating analysis of multimedia collections up to 100 million items on a single high-end workstation. Blackthorn features data compression, feature selection, and optimizations the process. The Ratio-64 representation introduced in this only costs tens bytes per item yet preserves most visual textual semantic information with good accuracy. optimized model scores Ratio-64-compressed directly, greatly...
The theory of echo chambers, which suggests that online political discussions take place in conditions ideological homogeneity, has recently gained popularity as an explanation for patterns polarization and radicalization observed many democratic countries. However, while micro-level experimental work shown evidence individuals may gravitate towards information supports their beliefs, recent macro-level studies have cast doubt on whether this tendency generates chambers practice, instead...
Multimodal demand forecasting aims at predicting product utilizing visual, textual, and contextual information. This paper proposes a method for such using an integrated architecture composed of convolutional, graph-based, transformer-based networks. Since traditional methods depend on historical factors like manually generated categorical information, they face challenges as the cold start problem handling category dynamics. To address these challenges, our allows incorporating multimodal...
We present an enhanced version of Exquisitor, our interactive and scalable media exploration system. At its core, Exquisitor is learning system using relevance feedback on items to build a model the users' information need. Relying efficient representation indexing, it facilitates real-time user interaction. The new features for Lifelog Search Challenge 2020 include support timeline browsing, search functionality finding positive examples, significant interface improvements. Participation in...
Content-based image retrieval (CBIR) systems with user relevance feedback are considered. The influence of the type and number feature vector (FV) components on efficiency was investigated. We compared a CBIR system very small FV (only 25 describing color texture) high-dimensional inspired by MPEG-7 (556 coordinates color, texture line directions), as well using reduction (FVR) about 90% (with 50 from full-length 556-component FVs). tested over annotated Corel 1K 60K datasets. Simulation...
This paper presents an automatic approach that uses community-contributed images to create representative and diverse visual summaries of specific geographic areas. Complex relations between images, extracted features, text associated with the as well users their social network are modeled using a multimodal graph. To compute affinities nodes in graph we rely on proven concept random walk restarts. The novelty our lies its use diverse, yet representative, image set. Further, introduce...
In this paper, we present analytic quality (AQ), a novel paradigm for the design and evaluation of multimedia analysis methods. AQ complements existing methods based on either machine-driven benchmarks or user studies. includes notion insight gain time needed to acquire it, both critical aspects large-scale collections analysis. To incorporate insight, introduces model. model, each simulated user, artificial actor, builds its over time, at any operating with multiple categories relevance....
We propose ArtSAGENet, a novel multimodal architecture that integrates Graph Neural Networks (GNNs) and Convolutional (CNNs), to jointly learn visual semantic-based artistic representations. First, we illustrate the significant advantages of multi-task learning for fine art analysis argue it is conceptually much more appropriate setting in domain than single-task alternatives. further demonstrate several GNN architectures can outperform strong CNN baselines range tasks, such as style...
Abstract Calls to “break up” radical echo chambers by injecting them with alternative viewpoints are common. Yet, thus far there is little evidence about the impact of such counter-messaging. To what extent and how do individuals who inhabit a chamber engage messages that challenge their core beliefs? Drawing on data from right forum Stormfront we address this question large-scale content longitudinal analysis users’ posting behavior, which analyses more than 35,000 English language...
In this paper we present a multimodal approach to categorizing user posts based on their discussion topic. To integrate heterogeneous information extracted from the posts, i.e. text, visual content and about interactions with online platform, deploy graph convolutional networks that were recently proven effective in classification tasks knowledge graphs. As case study use analysis of violent political extremism content, challenging task due particularly high semantic level at which extremist...
In this paper we propose an approach that utilizes visual features and conventional text-based pseudo-relevance feedback (PRF) to improve the results of semantic-theme-based video retrieval. Our reranking method is based on Average Item Distance (AID) score. AID-based designed suitability items at top initial list, i.e., those selected for use in query expansion. intended help target representative regularity typifying semantic theme query. Experiments performed VideoCLEF 2008 data set a...