Roberto Vezzani

ORCID: 0000-0002-1046-6870
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Face recognition and analysis
  • Hand Gesture Recognition Systems
  • Video Analysis and Summarization
  • Advanced Image and Video Retrieval Techniques
  • Advanced Vision and Imaging
  • Gait Recognition and Analysis
  • Face and Expression Recognition
  • Image Retrieval and Classification Techniques
  • Advanced Neural Network Applications
  • Robotics and Sensor-Based Localization
  • Music and Audio Processing
  • Advanced Memory and Neural Computing
  • CCD and CMOS Imaging Sensors
  • 3D Shape Modeling and Analysis
  • Generative Adversarial Networks and Image Synthesis
  • Anomaly Detection Techniques and Applications
  • Advanced Image Processing Techniques
  • Context-Aware Activity Recognition Systems
  • 3D Surveying and Cultural Heritage
  • IoT-based Smart Home Systems
  • Optical measurement and interference techniques
  • Handwritten Text Recognition Techniques
  • Multimedia Communication and Technology

University of Modena and Reggio Emilia
2015-2024

Ferrari (Italy)
2014-2024

SofTech (Italy)
2013

The interest of the research community in creating reference datasets for performance analysis is always very high. Although new datasets, collecting large amounts video footage are spreading surveillance and forensics, few bench-marks with annotation data available testing specific tasks especially 3D/multi-view analysis. In this paper we present 3DPeS, a dataset 3D/multi- view forensic applications. This has been designed discussing evaluating results people re-identification other related...

10.1145/2072572.2072590 article EN 2011-12-01

Computer vision and ubiquitous multimedia access nowadays make feasible the development of a mostly automated system for human-behavior analysis. In this context, our proposal is to analyze human behaviors by classifying posture monitored person and, consequently, detecting corresponding events alarm situations, like fall. To aim, approach can be divided in two phases: each frame, projection histograms (Haritaoglu et al., 1998) are computed compared with probabilistic maps stored during...

10.1109/tsmca.2004.838501 article EN IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans 2004-12-20

Abstract: In‐house video surveillance can represent an excellent support for people with some difficulties (e.g. elderly or disabled people) living alone and a limited autonomy. New hardware technologies in particular digital cameras are now affordable they have recently gained credit as tools (semi‐)automatically assuring people's safety. In this paper multi‐camera vision system detecting tracking recognizing dangerous behaviours events such fall is presented. situation suitable alarm be...

10.1111/j.1468-0394.2007.00438.x article EN Expert Systems 2007-10-18

Fast and accurate upper-body head pose estimation is a key task for automatic monitoring of driver attention, challenging context characterized by severe illumination changes, occlusions extreme poses. In this work, we present new deep learning framework localization on depth images. The core the proposal regressive neural network, called POSEidon, which composed three independent convolutional nets followed fusion layer, specially conceived understanding depth. addition, to recover...

10.1109/cvpr.2017.583 article EN 2017-07-01

Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination conditions make unusable common RGB sensors. Therefore, we propose a complete framework the estimation of head shoulder pose based on depth images only. A detection localization module is also included, in order develop end-to-end system. The core element Convolutional Neural Network, called POSEidon <sup...

10.1109/tpami.2018.2885472 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-12-07

10.1007/s11042-009-0402-9 article EN Multimedia Tools and Applications 2009-10-09

Multi-person pose estimation is the task of detecting and regressing keypoint coordinates multiple people in a single image. Significant progress has been achieved recent years, especially with introduction transformer-based end-to-end methods. In this paper, we present DualPose, novel framework that enhances multi-person by leveraging dual-block transformer decoding architecture. Class prediction are split into parallel blocks so each sub-task can be separately improved risk interference...

10.20944/preprints202504.0467.v1 preprint EN 2025-04-07

This work presents a novel people tracking approach, able to cope with frequent shape changes and large occlusions. In particular, the tracks are described by means of probabilistic masks appearance models. Occlusions due other or background objects false occlusions discriminated. The system is general enough be applied any motion segmentation module, it can track interacting each maintains pixel assignment even At same time, update model very reactive, so as sudden body silhouette's...

10.1109/icpr.2004.717 article EN 2004-08-23

The problem of labeling the connected components (CCL) a binary image is well-defined and several proposals have been presented in past. Since an exact solution to exists should be mandatory provided as output, algorithms mainly differ on their execution speed. In this paper, we propose describe YACCLAB, Yet Another Connected Components Labeling Benchmark. Together with rich varied dataset, YACCLAB contains open source platform test new compare them publicly available competitors. Textual...

10.1109/icpr.2016.7900112 article EN 2016-12-01

Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their application to visual data and, particular, the dynamic hand gesture recognition task has not yet been deeply investigated. In this paper, we propose transformer-based architecture for task. We show employment of single active depth sensor, specifically usage maps surface normals estimated from them, results,...

10.1109/3dv50981.2020.00072 article EN 2021 International Conference on 3D Vision (3DV) 2020-11-01

Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial imaging with drones and UAVs emergency responses. In this work, we introduce TakuNet, novel light-weight architecture which employs techniques depth-wise convolutions an early downsampling stem to reduce computational complexity while maintaining high accuracy. It leverages dense connections fast convergence during training uses 16-bit...

10.48550/arxiv.2501.05880 preprint EN arXiv (Cornell University) 2025-01-10

The correct estimation of the head pose is a problem great importance for many applications.For instance, it an enabling technology in automotive driver attention monitoring.In this paper, we tackle through deep learning network working regression manner.Traditional methods usually rely on visual facial features, such as landmarks or nose tip position.In contrast, exploit Convolutional Neural Network (CNN) to perform directly from depth data.We Siamese architecture and propose novel loss...

10.5220/0006104501940201 article EN cc-by-nc-nd Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications 2017-01-01

To enhance video surveillance systems, multi-modal sensor integration can be a successful strategy. In this work, computer vision system able to detect and track people from multiple cameras is integrated with wireless network mounting PIR (Passive InfraRed) sensors. The two subsystems are briefly described possible cases in which algorithms likely fail discussed. Then, simple but reliable outputs the nodes exploited improve accuracy of system. particular, case studies reported: first uses...

10.1145/1099396.1099415 article EN 2005-11-11

The paper presents an approach for a robust (semi-)automatic correction of radial lens distortion in images and videos. This method, based on the Hough transform, has characteristics to be applicable also videos from unknown cameras that, consequently, can not priori calibrated. We approximated by considering only lower-order term distortion. Thus, method relies assumption that pure transforms straight lines into curves. computation best value parameter is performed multi-resolution way....

10.1109/iciap.2003.1234047 article EN 2004-02-03
Coming Soon ...