- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Video Analysis and Summarization
- Video Surveillance and Tracking Methods
- Image Retrieval and Classification Techniques
- Anomaly Detection Techniques and Applications
- Autonomous Vehicle Technology and Safety
- Topic Modeling
- Digital Media Forensic Detection
- Natural Language Processing Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Vision and Imaging
- Image Processing Techniques and Applications
- Digital Imaging for Blood Diseases
- Advanced Steganography and Watermarking Techniques
- Human Mobility and Location-Based Analysis
- Cell Image Analysis Techniques
- Handwritten Text Recognition Techniques
- Music and Audio Processing
- Robotics and Sensor-Based Localization
- Robotic Path Planning Algorithms
- Visual Attention and Saliency Detection
- Traffic Prediction and Management Techniques
- Social Robot Interaction and HRI
University of Padua
2005-2024
Civita
2017-2024
University of Florence
2009-2022
National Research Council
2022
Amazon (United States)
2022
University of Modena and Reggio Emilia
2022
Universidad de Las Palmas de Gran Canaria
2022
Marche Polytechnic University
2022
Webb Institute
2022
Technical University of Munich
2022
One of the principal problems in image forensics is determining if a particular authentic or not. This can be crucial task when images are used as basic evidence to influence judgment like, for example, court law. To carry out such forensic analysis, various technological instruments have been developed literature. In this paper, problem detecting an has forged investigated; particular, attention paid case which area copied and then pasted onto another zone create duplication cancel...
Human motion and behaviour in crowded spaces is influenced by several factors, such as the dynamics of other moving agents scene, well static elements that might be perceived points attraction or obstacles. In this work, we present a new model for human trajectory prediction which able to take advantage both human-human human-space interactions. The future humans, are generated observing their past positions interactions with surroundings. To end, propose “context-aware” recurrent neural...
When an attacker wants to falsify image, in most of cases she/he will perform a JPEG recompression. Different techniques have been developed based on diverse theoretical assumptions but very effective solutions not yet. Recently, machine learning approaches started appear the field image forensics solve tasks such as acquisition source identification and forgery detection. In this last case, aim ahead would be get trained neural network able, given to-be-checked reliably localize forged...
Some images that are difficult to recognize on their own may become more clear in the context of a neighborhood related with similar social-network metadata. We build this intuition improve multilabel image annotation. Our model uses metadata nonparametrically generate neighborhoods using Jaccard similarities, then deep neural network blend visual information from and its neighbors. Prior work typically models parametrically, contrast, our nonparametric treatment allows perform well even...
Mimicking human ability to forecast future positions or interpret complex interactions in urban scenarios, such as streets, shopping malls squares, is essential develop socially compliant robots self-driving cars. Autonomous systems may gain advantage on anticipating motion avoid collisions naturally behave alongside people. To foresee plausible trajectories, we construct an LSTM (long short-term memory)-based model considering three fundamental factors: people interactions, past...
Automatic image annotation is still an important open problem in multimedia and computer vision. The success of media sharing websites has led to the availability large collections images tagged with human-provided labels. Many approaches previously proposed literature do not accurately capture intricate dependencies between content annotations. We propose a learning procedure based on Kernel Canonical Correlation Analysis which finds mapping visual textual words by projecting them into...
In open set recognition, a classifier has to detect unknown classes that are not known at training time. order recognize new categories, the project input samples of in very compact and separated regions features space for discriminating classes. Recently proposed Capsule Networks have shown outperform alternatives many fields, particularly image however they been fully applied yet open-set recognition. capsule networks, scalar neurons replaced by vectors or matrices, whose entries represent...
Human trajectory forecasting is a key component of autonomous vehicles, social-aware robots and advanced video-surveillance applications. This challenging task typically requires knowledge about past motion, the environment likely destination areas. In this context, multi-modality fundamental aspect its effective modeling can be beneficial to any architecture. Inferring accurate trajectories nevertheless challenging, due inherently uncertain nature future. To overcome these difficulties,...
Accurate prediction of future human positions is an essential task for modern video-surveillance systems. Current state-of-the-art models usually rely on a "history" past tracked locations (e.g., 3 to 5 seconds) predict plausible sequence up the next seconds). We feel that this common schema neglects critical traits realistic applications: as collection input trajectories involves machine perception (i.e., detection and tracking), incorrect fragmentation errors may accumulate in crowded...
We contribute, through this paper, to the design of a novel variational framework able match and recognize multiple instances reference logos in image archives. Reference test images are seen as constellations local features (interest points, regions, etc.) matched by minimizing an energy function mixing: 1) fidelity term that measures quality feature matching, 2) neighborhood criterion captures co-occurrence/geometry, 3) regularization controls smoothness matching solution. also introduce...
Some images that are difficult to recognize on their own may become more clear in the context of a neighborhood related with similar social-network metadata. We build this intuition improve multilabel image annotation. Our model uses metadata nonparametrically generate neighborhoods using Jaccard similarities, then deep neural network blend visual information from and its neighbors. Prior work typically models parametrically, contrast, our nonparametric treatment allows perform well even...
In this paper we describe a system for detection and retrieval of trademarks appearing in sports videos. We propose compact representation video frame content based on SIFT feature points. This can be used to robustly detect, localize, retrieve as they appear variety different types. Classification is performed by matching set descriptors each trademark instance against the features detected video. Localization through robust clustering matched points frame. Experimental results are...
An approach for automatic annotation and retrieval of video content uses semantic concept classifiers ontologies to permit expanded queries synonyms specializations.
In many application scenarios digital images play a basic role and often it is important to assess if their content realistic or has been manipulated mislead watcher's opinion. Image forensics tools provide answers similar questions. This paper, in particular, focuses on the problem of detecting feigned image created by cloning an area onto another zone make duplication cancel something awkward. The proposed method based SIFT features allows both understand which are points involved...
Automated colonoscopy reporting holds great potential for enhancing quality control and improving cost-effectiveness of procedures. A major challenge lies in the automated identification, tracking, re-association (ReID) polyps tracklets across full-procedure videos. This is essential precise polyp counting enables computation key metrics, such as Adenoma Detection Rate (ADR) Polyps Per Colonoscopy (PPC). However, ReID challenging due to variations appearance, frequent disappearance from...
Recognition and classification of human actions for annotation unconstrained video sequences has proven to be challenging because the variations in environment, appearance actors, modalities which same action is performed by different persons, speed duration, points view from event observed. This variability reflects difficulty defining effective descriptors deriving appropriate codebooks categorization. In this paper, we propose a novel solution classify videos. It improves on previous...
Human capability to anticipate near future from visual observations and non-verbal cues is essential for developing intelligent systems that need interact with people. Several research areas, such as human-robot interaction (HRI), assisted living or autonomous driving foresee events avoid crashes help Egocentric scenarios are classic examples where action anticipation applied due their numerous applications. Such challenging task demands capture model domain's hidden structure reduce...
In this paper, we present a novel approach to incrementally learn an Abstract Model of unknown environment, and show how agent can reuse the learned model for tackling Object Goal Navigation task. The is finite state machine in which each abstraction as perceived by certain position orientation. perceptions are high-dimensional sensory data (e.g., RGB-D images), reached exploiting image segmentation Taskonomy bank. learning accomplished executing actions, observing state, updating with...