Simon Hadfield

ORCID: 0000-0001-8637-5054
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • Human Pose and Action Recognition
  • Video Surveillance and Tracking Methods
  • Advanced Image and Video Retrieval Techniques
  • Hand Gesture Recognition Systems
  • Image Enhancement Techniques
  • Advanced Neural Network Applications
  • Hearing Impairment and Communication
  • Reinforcement Learning in Robotics
  • Image Processing Techniques and Applications
  • Multimodal Machine Learning Applications
  • Advanced Image Processing Techniques
  • 3D Surveying and Cultural Heritage
  • Domain Adaptation and Few-Shot Learning
  • Optical measurement and interference techniques
  • Gait Recognition and Analysis
  • Remote Sensing and LiDAR Applications
  • CCD and CMOS Imaging Sensors
  • Industrial Vision Systems and Defect Detection
  • Robotic Path Planning Algorithms
  • Computer Graphics and Visualization Techniques
  • Neural dynamics and brain function
  • Advanced Memory and Neural Computing
  • Modular Robots and Swarm Intelligence

University of Surrey
2016-2025

Signal Processing (United States)
2019-2022

The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results 62 are presented. number tested makes VOT 2015 the largest benchmark on tracking to date. For each participating tracker, a short description is provided in appendix. Features VOT2015 go beyond its VOT2014 predecessor are: (i) new dataset twice as large with full annotation targets by rotated bounding boxes and...

10.1109/iccvw.2015.79 preprint EN 2015-12-01

Sign Language Recognition (SLR) has been an active research field for the last two decades. However, most to date considered SLR as a naive gesture recognition problem. seeks recognize sequence of continuous signs but neglects underlying rich grammatical and linguistic structures sign language that differ from spoken language. In contrast, we introduce Translation (SLT) Here, objective is generate translations videos, taking into account different word orders grammar. We formalize SLT in...

10.1109/cvpr.2018.00812 article EN 2018-06-01

The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by VOT initiative. Results of 51 trackers are presented; many state-of-the-art published at major computer vision conferences or journals in recent years. evaluation included standard and other popular methodologies a new "real-time" experiment simulating situation where processes images as if provided continuously running sensor. Performance tested typically far exceeds baselines. source...

10.1109/iccvw.2017.230 preprint EN 2017-10-01

Prior work on Sign Language Translation has shown that having a mid-level sign gloss representation (effectively recognizing the individual signs) improves translation performance drastically. In fact, current state-of-the-art in requires level tokenization order to work. We introduce novel transformer based architecture jointly learns Continuous Recognition and while being trainable an end-to-end manner. This is achieved by using Connectionist Temporal Classification (CTC) loss bind...

10.1109/cvpr42600.2020.01004 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

We propose a novel deep learning approach to solve simultaneous alignment and recognition problems (referred as "Sequence-to-sequence" learning). decompose the problem into series of specialised expert systems referred SubUNets. The spatio-temporal relationships between these SubUNets are then modelled task, while remaining trainable end-to-end. mimics human educational techniques, has number significant advantages. allow us inject domain-specific knowledge system regarding suitable...

10.1109/iccv.2017.332 article EN 2017-10-01

Abstract We present a novel approach to automatic Sign Language Production using recent developments in Neural Machine Translation (NMT), Generative Adversarial Networks, and motion generation. Our system is capable of producing sign videos from spoken language sentences. Contrary current approaches that are dependent on heavily annotated data, our requires minimal gloss skeletal level annotations for training. achieve this by breaking down the task into dedicated sub-processes. first...

10.1007/s11263-019-01281-2 article EN cc-by International Journal of Computer Vision 2020-01-02

In this paper, we propose using 3D Convolutional Neural Networks for large scale user-independent continuous gesture recognition. We have trained an end-to-end deep network recognition (jointly learning both the feature representation and classifier). The performs three-dimensional (i.e. space-time) convolutions to extract features related appearance motion from volumes of color frames. Space-time invariance extracted is encoded via pooling layers. earlier stages are partially initialized...

10.1109/icpr.2016.7899606 article EN 2016-12-01

This paper summarizes the results of first Monocular Depth Estimation Challenge (MDEC) organized at WACV2023. challenge evaluated progress self-supervised monocular depth estimation on challenging SYNS-Patches dataset. The was CodaLab and received submissions from 4 valid teams. Participants were provided a devkit containing updated reference implementations for 16 State-of-the-Art algorithms novel techniques. threshold acceptance techniques to outperform every one SotA baselines. All...

10.1109/wacvw58289.2023.00069 article EN 2023-01-01

Action recognition in unconstrained situations is a difficult task, suffering from massive intra-class variations. It made even more challenging when complex 3D actions are projected down to the image plane, losing great deal of information. The recent emergence data, both broadcast content, and commercial depth sensors, provides possibility overcome this issue. This paper presents new dataset, for benchmarking action algorithms natural environments, while making use dataset contains around...

10.1109/cvpr.2013.436 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2013-06-01

The motion field of a scene can be used for object segmentation and to provide features classification tasks like action recognition. Scene flow is the full 3D scene, more difficult estimate than it's 2D counterpart, optical flow. Current approaches use smoothness cost regularisation, which tends over-smooth at boundaries. This paper presents novel formulation estimation, collection moving points in space, modelled using particle filter that supports multiple hypotheses does not oversmooth...

10.1109/iccv.2011.6126509 article EN International Conference on Computer Vision 2011-11-01

In the current monocular depth research, dominant approach is to employ unsupervised training on large datasets, driven by warped photometric consistency. Such approaches lack robustness and are unable generalize challenging domains such as nighttime scenes or adverse weather conditions where assumptions about consistency break down. We propose DeFeat-Net (Depth & Feature network), an simultaneously learn a cross-domain dense feature representation, alongside robust depth-estimation...

10.1109/cvpr42600.2020.01441 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

10.1109/lra.2025.3546513 article EN IEEE Robotics and Automation Letters 2025-01-01

In this paper, an algorithm is presented for estimating scene flow, which a richer, 3D analog of optical flow. The approach operates orders magnitude faster than alternative techniques and well suited to further performance gains through parallelized implementation. employs multiple hypotheses deal with motion ambiguities, rather the traditional smoothness constraints, removing oversmoothing errors providing significant improvements on benchmark data, over previous state art. flexible...

10.1109/tpami.2013.162 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2013-08-26

Long term tracking of an object, given only a single instance in initial frame, remains open problem. We propose visual algorithm, robust to many the difficulties which often occur real-world scenes. Correspondences edge-based features are used, overcome reliance on texture tracked object and improve invariance lighting. Furthermore we address long-term stability, enabling tracker recover from drift provide redetection following disappearance or occlusion. The two-module principle is similar...

10.1109/iccvw.2013.26 article EN IEEE International Conference on Computer Vision Workshops 2013-12-01

How does a person work out their location using floorplan? It is probably safe to say that we do not explicitly measure depths every visible surface and try match them against different pose estimates in the floorplan. And yet, this exactly how most robotic scan-matching algorithms operate. Similarly, extrude 2D geometry present floorplan into 3D align it real-world. vision-based approaches localise. Humans exact opposite. Instead of depth, use high level semantic cues. extruding up third...

10.1109/icra.2018.8461074 article EN 2018-05-01

"Like night and day" is a commonly used expression to imply that two things are completely different. Unfortunately, this tends be the case for current visual feature representations of same scene across varying seasons or times day. The aim paper provide dense representation can perform localization, sparse matching image retrieval, regardless seasonal temporal appearance. Recently, there have been several proposed methodologies deep learning representations. These methods make use ground...

10.1109/cvpr42600.2020.00649 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Reconstruction of 3D environments is a problem that has been widely addressed in the literature. While many approaches exist to perform reconstruction, few them take an active role deciding where next observations should come from. Furthermore, travelling from camera's current position next, known as pathplanning, usually focuses on minimising path length. This approach ill-suited for reconstruction applications, learning about environment more valuable than speed traversal. We present novel...

10.1109/iccv.2017.501 article EN 2017-10-01

Self-supervised monocular depth estimation (SS-MDE) has the potential to scale vast quantities of data. Unfortunately, existing approaches limit themselves automotive domain, resulting in models incapable generalizing complex environments such as natural or indoor settings.To address this, we propose a large-scale SlowTV dataset curated from YouTube, containing an order magnitude more data than datasets. contains 1.7M images rich diversity environments, worldwide seasonal hiking, scenic...

10.1109/iccv51070.2023.01445 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01
Coming Soon ...