- Advanced Vision and Imaging
- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Visual Attention and Saliency Detection
- Anomaly Detection Techniques and Applications
- Advanced Image Processing Techniques
- Domain Adaptation and Few-Shot Learning
- COVID-19 diagnosis using AI
- Video Coding and Compression Technologies
- Data-Driven Disease Surveillance
- Face and Expression Recognition
- Network Security and Intrusion Detection
- Video Surveillance and Tracking Methods
- Multimodal Machine Learning Applications
- Complex Network Analysis Techniques
- Advanced Graph Neural Networks
- Gaze Tracking and Assistive Technology
- Robotics and Sensor-Based Localization
- Robotic Path Planning Algorithms
- Visual perception and processing mechanisms
- CCD and CMOS Imaging Sensors
- Data Management and Algorithms
- Image Enhancement Techniques
- Influenza Virus Research Studies
- Video Analysis and Summarization
Qualcomm (United Kingdom)
2021-2023
Adrian College
2023
Directorate of Medicinal and Aromatic Plants Research
2023
Qualcomm (United States)
2022
Istituto Tecnico Industriale Alessandro Volta
2021
Weatherford College
2021
University of Modena and Reggio Emilia
2015-2020
Ferrari (Italy)
2017-2019
Novelty detection is commonly referred as the discrimination of observations that do not conform to a learned model regularity. Despite its importance in different application settings, designing novelty detector utterly complex due unpredictable nature novelties and inaccessibility during training procedure, factors which expose unsupervised problem. In our proposal, we design general framework where equip deep autoencoder with parametric density estimator learns probability distribution...
Continual Learning has inspired a plethora of approaches and evaluation settings; however, the majority them overlooks properties practical scenario, where data stream cannot be shaped as sequence tasks offline training is not viable. We work towards General (GCL), task boundaries blur domain class distributions shift either gradually or suddenly. address it through mixing rehearsal with knowledge distillation regularization; our simple baseline, Dark Experience Replay, matches network's...
In this work we aim to predict the driver's focus of attention. The goal is estimate what a person would pay attention while driving, and which part scene around vehicle more critical for task. To end propose new computer vision model based on multi-branch deep architecture that integrates three sources information: raw video, motion semantics. We also introduce DR(eye)VE, largest dataset driving scenes eye-tracking annotations are available. This features than 500,000 registered frames,...
Convolutional Neural Networks experience catastrophic forgetting when optimized on a sequence of learning problems: as they meet the objective current training examples, their performance previous tasks drops drastically. In this work, we introduce novel framework to tackle problem with conditional computation. We equip each convolutional layer task-specific gating modules, selecting which filters apply given input. This way, achieve two appealing properties. Firstly, execution patterns...
We propose Skip-Convolutions to leverage the large amount of redundancies in video streams and save computations. Each is represented as a series changes across frames network activations, denoted residuals. reformulate standard convolution be efficiently computed on residual frames: each layer coupled with binary gate deciding whether important model prediction, e.g. foreground regions, or it can safely skipped, background regions. These gates either implemented an efficient trained jointly...
Augmented user experiences in the cultural heritage domain are increasing demand by new digital native tourists of 21st century. In this paper, we propose a novel solution that aims at assisting visitor during an outdoor tour site using unique first person perspective wearable cameras. particular, approach exploits computer vision techniques to retrieve details proposing robust descriptor based on covariance local features. Using lightweight board, can localize with respect 3D point cloud...
Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, attributes of given inputs, following textual edit prompts. However, such solutions typically incur heavy memory computational costs to generate temporally-coherent frames, in form diffusion inversion and/or cross-frame attention. In this paper, we conduct an analysis inefficiencies, suggest simple yet effective modifications that allow significant speed-ups whilst...
Dense optical flow estimation is complex and time consuming, with state-of-the-art methods relying either on large synthetic data sets or pipelines requiring up to a few minutes per frame pair. In this paper, we address the problem of in automotive scenario self-supervised manner. We argue that can be cast as geometrical warping between two successive video frames devise deep architecture estimate such transformation stages. First, dense pixel-level computed projective bootstrap rigid...
We address unsupervised optical flow estimation for ego-centric motion. argue that can be cast as a geometrical warping between two successive video frames and devise deep architecture to estimate such transformation in stages. First, dense pixel-level is computed with geometric prior imposing strong spatial constraints. Such typical of driving scenes, where the point view coherent vehicle show how global approximated an homography transformer layers employed compute field implied by...
We present a novel and hierarchical approach for supervised classification of signals spanning over fixed graph, reflecting shared properties the dataset. To this end, we introduce Convolutional Cluster Pooling layer exploiting multi-scale clustering in order to highlight, at different resolutions, locally connected regions on input graph. Our proposal generalises well-established neural models such as Neural Networks (CNNs) irregular complex domains, by means exploitation weight sharing...
Humans do not perceive all parts of a scene with the same resolution, but rather focus on few regions interest (ROIs). Traditional Object-Based codecs take advantage this biological intuition, and are capable non-uniform allocation bits in favor salient regions, at expense increased distortion remaining areas: such strategy allows boost perceptual quality under low rate constraints. Recently, several neural have been introduced for video compression, yet they operate uniformly over spatial...
The interest in cultural cities is constant growth, and so the demand for new multimedia tools applications that enrich their fruition. In this paper we propose an egocentric vision system to enhance tourists' heritage experience. Exploiting a wearable board glass-mounted ca
Generative models have become a powerful tool for image editing tasks, including object insertion. However, these methods often lack spatial awareness, generating objects with unrealistic locations and scales, or unintentionally altering the scene background. A key challenge lies in maintaining visual coherence, which requires both geometrically suitable location high-quality edit. In this paper, we focus on former, creating model dedicated to identifying realistic locations. Specifically,...
This paper accelerates video perception, such as segmentation and human pose estimation, by levering cross-frame redundancies. Unlike the existing approaches, which avoid redundant computations warping past features using optical-flow or performing sparse convolutions on frame differences, we approach problem from a different perspective: low-bit quantization. We observe that residuals, difference in network activations between two neighboring frames, exhibit properties make them highly...