- Image Retrieval and Classification Techniques
- Video Analysis and Summarization
- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Anomaly Detection Techniques and Applications
- Music and Audio Processing
- Sexual Differentiation and Disorders
- Video Surveillance and Tracking Methods
- Visual Attention and Saliency Detection
- Advanced Vision and Imaging
- EEG and Brain-Computer Interfaces
- Hand Gesture Recognition Systems
- Image and Video Quality Assessment
- Face recognition and analysis
- Service-Oriented Architecture and Web Services
- Misinformation and Its Impacts
- Digital Media Forensic Detection
- Generative Adversarial Networks and Image Synthesis
- Face and Expression Recognition
- Functional Brain Connectivity Studies
- Multimedia Communication and Technology
- Biomedical Text Mining and Ontologies
- Natural Language Processing Techniques
- Speech and Audio Processing
Universitatea Națională de Știință și Tehnologie Politehnica București
2016-2025
University of Ottawa
2011-2024
University of Science and Technology
2024
University of Bucharest
2008-2023
Institutul Clinic Fundeni
2022
Wellcome Trust
2021
Eötvös Loránd University
2021
Dublin City University
2020
Institut de Recherche en Informatique et Systèmes Aléatoires
2017
York General Hospital
2017
As of today, most movie recommendation services base their recommendations on collaborative filtering (CF) and/or content-based (CBF) models that use metadata (e.g., genre or cast). In video-on-demand and streaming services, however, new movies TV series are continuously added. CF unable to make predictions in such a scenario, since the newly added videos lack interactions—a problem technically known as item cold start (CS). Currently, common approach this is switch purely CBF method,...
We introduce Spatio-Temporal Vector of Locally Max Pooled Features (ST-VLMPF), a super vector-based encoding method specifically designed for local deep features encoding. The proposed addresses an important problem video understanding: how to build representation that incorporates the CNN over entire video. Feature assignment is carried out at two levels, by using similarity and spatio-temporal information. For each we specific encoding, focused on nature features, with goal capture highest...
Abstract The combination of TMS and EEG has the potential to capture relevant features Alzheimer’s disease (AD) pathophysiology. We used a machine learning framework explore time-domain characterizing AD patients compared age-matched healthy controls (HC). More than 150 including some related local distributed evoked activity were extracted from TMS-EEG data fed into Random Forest (RF) classifier using leave-one-subject out validation approach. best classification accuracy, sensitivity,...
This paper discusses the use of computer vision in interpretation human gestures. Hand gestures would be an intuitive and ideal way exchanging information with other people a virtual space, guiding some robots to perform certain tasks hostile environment, or interacting computers. can divided into two main categories: static dynamic In this paper, novel hand gesture recognition technique is proposed. It based on 2D skeleton representation hand. For each gesture, skeletons posture are...
In this article, we report on the creation of a publicly available, common evaluation framework for Violent Scenes Detection (VSD) in Hollywood and YouTube videos. We propose robust data set, VSD96, with more than 96 hours video various genres, annotations at different levels detail (e.g., shot-level, segment-level), mid-level concepts blood, fire), pre-computed multi-modal descriptors, over 230 system output results as baselines. This is most comprehensive set available to date tailored VSD...
In this paper we present a video summarization method based on the study of spatio-temporal activity within video. The visual is estimated by measuring number interest points, jointly obtained in spatial and temporal domains. proposed approach composed five steps. First, image features are collected using Hessian matrix. Then, these processed to retrieve candidate segments for summary (denoted clips). Further on, two specific steps designed first detect redundant clips, second eliminate...
Multiview video (MVV) plus depths formats use view synthesis to build intermediate views from existing adjacent at the receiver side. Traditional exploits disparity information interpolate an by considered inter-view correlations. However, temporal correlation between different frames of can be used improve synthesis. We propose a new coding scheme for 3-D High Efficiency Video Coding (HEVC) that allows us take full advantage correlations in and views. optical flow techniques derive dense...
In this paper we propose a new dataset, Div400, that was designed to support shared evaluation in different areas of social media photo retrieval, e.g., machine analysis (re-ranking, learning), human-based computation (crowdsourcing) or hybrid approaches (relevance feedback, machine-crowd integration). Div400 comes with associated relevance and diversity assessments performed by human annotators. 396 landmark locations are represented via 43,418 Flickr photos metadata, Wikipedia pages...
The Benchmarking Initiative for Multimedia Evaluation (MediaEval) organizes an annual cycle of scientific evaluation tasks in the area multimedia access and retrieval. offer challenges to researchers working diverse areas technology. tasks, which are focused on social human aspects multimedia, help research community tackle linked less widely studied user needs. They also support investigating diversity perspectives that naturally arise when users interact with content. Here, authors present...
Financial markets have always been a point of interest for automated systems. Due to their complex nature, financial algorithms and fintech frameworks require vast amounts data accurately respond market fluctuations. This availability is tied the daily evolution, so it impossible accelerate its acquisition. In this article, we discuss several solutions augmenting datasets via synthesizing realistic time-series with help generative models. problem complex, since time series present very...
This paper addresses the issue of detecting violent scenes in Hollywood movies. In this context, we describe MediaEval 2013 Violent Scene Detection task which proposes a consistent evaluation framework to research community. 9 participating teams proposed systems for 2013, denotes an increasing interest task. paper, dataset, annotations process and task's rules are detailed. The submitted thoroughfully analysed compared through several metrics draw conclusions on most promising techniques...
We propose a multi-modal content-based movie recommender system that replaces human-generated metadata with content descriptions automatically extracted from the visual and audio channels of video. Content descriptors improve over traditional in terms both richness (it is possible to extract hundreds meaningful features covering various modalities) quality (content are consistent across different systems immune human errors). Our integrates state-of-the-art aesthetic deep as well block-level...
The control of computers and electronics through hand gestures has gained significant industry academic attention lately for the usability benefits convenience that it offers users. Of particular research interest been living room environments containing televisions set-top boxes. However, existing failed to provide a flexible solution controlling such devices by gestures. They have used cameras are sensitive environmental factors as lighting or unreasonable calibration demands....