Richard Bowden

ORCID: 0000-0003-3285-8020
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Human Pose and Action Recognition
  • Hand Gesture Recognition Systems
  • Hearing Impairment and Communication
  • Video Surveillance and Tracking Methods
  • Advanced Vision and Imaging
  • Advanced Image and Video Retrieval Techniques
  • Robotics and Sensor-Based Localization
  • Gait Recognition and Analysis
  • Autonomous Vehicle Technology and Safety
  • Video Analysis and Summarization
  • Multimodal Machine Learning Applications
  • Anomaly Detection Techniques and Applications
  • Advanced Neural Network Applications
  • Face recognition and analysis
  • Human Motion and Animation
  • Adversarial Robustness in Machine Learning
  • Face and Expression Recognition
  • Image Retrieval and Classification Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Image Enhancement Techniques
  • Image Processing Techniques and Applications
  • Scheduling and Optimization Algorithms
  • Speech and Audio Processing
  • Advanced Manufacturing and Logistics Optimization
  • Optical measurement and interference techniques

University of Surrey
2016-2025

Signal Processing (United States)
2019-2022

Allegheny College
2020

University of Freiburg
2017

Codarts Rotterdam
2016

Council of Science Editors
2016

Iowa City Public Library
2016

University of East Anglia
2009

University of Oxford
2004

Brunel University of London
1997-2002

The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results 62 are presented. number tested makes VOT 2015 the largest benchmark on tracking to date. For each participating tracker, a short description is provided in appendix. Features VOT2015 go beyond its VOT2014 predecessor are: (i) new dataset twice as large with full annotation targets by rotated bounding boxes and...

10.1109/iccvw.2015.79 preprint EN 2015-12-01

Designing a controller for autonomous vehicles capable of providing adequate performance in all driving scenarios is challenging due to the highly complex environment and inability test system wide variety which it may encounter after deployment. However, deep learning methods have shown great promise not only excellent non-linear control problems, but also generalising previously learned rules new scenarios. For these reasons, use vehicle becoming increasingly popular. Although important...

10.1109/tits.2019.2962338 article EN IEEE Transactions on Intelligent Transportation Systems 2020-01-07

Sign Language Recognition (SLR) has been an active research field for the last two decades. However, most to date considered SLR as a naive gesture recognition problem. seeks recognize sequence of continuous signs but neglects underlying rich grammatical and linguistic structures sign language that differ from spoken language. In contrast, we introduce Translation (SLT) Here, objective is generate translations videos, taking into account different word orders grammar. We formalize SLT in...

10.1109/cvpr.2018.00812 article EN 2018-06-01

The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by VOT initiative. Results of 51 trackers are presented; many state-of-the-art published at major computer vision conferences or journals in recent years. evaluation included standard and other popular methodologies a new "real-time" experiment simulating situation where processes images as if provided continuously running sensor. Performance tested typically far exceeds baselines. source...

10.1109/iccvw.2017.230 preprint EN 2017-10-01

This article presents an interactive hand shape recognition user interface for American Sign Language (ASL) finger-spelling. The system makes use of a Microsoft Kinect device to collect appearance and depth images, the OpenNI+NITE framework detection tracking. Hand-shapes corresponding letters alphabet are characterized using images classified random forests. We compare classification show combination both lead best results, validate on dataset four different users. works in real-time is...

10.1109/iccvw.2011.6130290 article EN 2011-11-01

10.1016/j.cviu.2010.12.001 article EN Computer Vision and Image Understanding 2011-01-08

Prior work on Sign Language Translation has shown that having a mid-level sign gloss representation (effectively recognizing the individual signs) improves translation performance drastically. In fact, current state-of-the-art in requires level tokenization order to work. We introduce novel transformer based architecture jointly learns Continuous Recognition and while being trainable an end-to-end manner. This is achieved by using Connectionist Temporal Classification (CTC) loss bind...

10.1109/cvpr42600.2020.01004 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Visual tracking has attracted a significant attention in the last few decades. The recent surge number of publications on tracking-related problems have made it almost impossible to follow developments field. One reasons is that there lack commonly accepted annotated data-sets and standardized evaluation protocols would allow objective comparison different methods. To address this issue, Object Tracking (VOT) workshop was organized conjunction with ICCV2013. Researchers from academia as well...

10.1109/iccvw.2013.20 article EN IEEE International Conference on Computer Vision Workshops 2013-12-01

We propose a novel deep learning approach to solve simultaneous alignment and recognition problems (referred as "Sequence-to-sequence" learning). decompose the problem into series of specialised expert systems referred SubUNets. The spatio-temporal relationships between these SubUNets are then modelled task, while remaining trainable end-to-end. mimics human educational techniques, has number significant advantages. allow us inject domain-specific knowledge system regarding suitable...

10.1109/iccv.2017.332 article EN 2017-10-01

In this work we present a new approach to the field of weakly supervised learning in video domain. Our method is relevant sequence problems which can be split up into sub-problems that occur parallel. Here, experiment with sign language data. The exploits constraints within each independent stream and combines them by explicitly imposing synchronisation points make use parallelism all share. We do multi-stream HMMs while adding intermediate among streams. embed powerful CNN-LSTM models HMM...

10.1109/tpami.2019.2911077 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-04-25

This work presents a new approach to learning framebased classifier on weakly labelled sequence data by embedding CNN within an iterative EM algorithm. allows the be trained vast number of example images when only loose level information is available for source videos. Although we demonstrate this in context hand shape recognition, has wider application any video recognition task where frame labelling not available. The algorithm leverages discriminative ability iteratively refine annotation...

10.1109/cvpr.2016.412 article EN 2016-06-01

This paper introduces the end-to-end embedding of a CNN into HMM, while interpreting outputs in Bayesian fashion. The hybrid CNN-HMM combines strong discriminative abilities CNNs with sequence modelling capabilities HMMs. Most current approaches field gesture and sign language recognition disregard necessity dealing data both for training evaluation. With our presented we are able to improve over state-of-the-art on three challenging benchmark continuous tasks by between 15% 38% relative up...

10.5244/c.30.136 article EN 2016-01-01

Abstract We present a novel approach to automatic Sign Language Production using recent developments in Neural Machine Translation (NMT), Generative Adversarial Networks, and motion generation. Our system is capable of producing sign videos from spoken language sentences. Contrary current approaches that are dependent on heavily annotated data, our requires minimal gloss skeletal level annotations for training. achieve this by breaking down the task into dedicated sub-processes. first...

10.1007/s11263-019-01281-2 article EN cc-by International Journal of Computer Vision 2020-01-02

This manuscript introduces the end-to-end embedding of a CNN into HMM, while interpreting outputs in Bayesian framework. The hybrid CNN-HMM combines strong discriminative abilities CNNs with sequence modelling capabilities HMMs. Most current approaches field gesture and sign language recognition disregard necessity dealing data both for training evaluation. With our presented we are able to improve over state-of-the-art on three challenging benchmark continuous tasks by between 15 38%...

10.1007/s11263-018-1121-3 article EN cc-by International Journal of Computer Vision 2018-10-05

We approach instantaneous mapping, converting images to a top-down view of the world, as translation problem. show how novel form transformer network can be used map from and video directly an overhead or bird's-eye-view (BEV) in single end-to-end network. assume 1–1 correspondence between vertical scanline image, rays passing through camera location map. This lets us formulate generation image set sequence-to-sequence translations. Posing problem allows use context when interpreting role...

10.1109/icra46639.2022.9811901 article EN 2022 International Conference on Robotics and Automation (ICRA) 2022-05-23

The ability to detect a persons unconstrained hand in natural video sequence has applications sign language, gesture recognition and HCl. This paper presents novel, unsupervised approach training an efficient robust detector which is capable of not only detecting the presence human hands within image but classifying shape. A database images first clustered using k-method clustering algorithm with distance metric based upon shape context. From this, tree structure boosted cascades...

10.1109/afgr.2004.1301646 article EN 2004-06-10

The field of Action Recognition has seen a large increase in activity recent years. Much the progress been through incorporating ideas from single-frame object recognition and adapting them for temporal-based action recognition. Inspired by success interest points 2D spatial domain, their 3D (space-time) counterparts typically form basic components used to describe actions, features are often engineered fire sparsely. This is ensure that problem tractable; however, this can sacrifice...

10.1109/tpami.2010.144 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2010-08-24

Within the field of action recognition, features and descriptors are often engineered to be sparse invariant transformation. While sparsity makes problem tractable, it is not necessarily optimal in terms class separability classification. This paper proposes a novel approach that uses very dense corner spatially temporally grouped hierarchical process produce an overcomplete compound feature set. Frequently reoccurring patterns then found through data mining, designed for use with large...

10.1109/iccv.2009.5459335 article EN 2009-09-01

This paper deals with robust modelling of mouth shapes in the context sign language recognition using deep convolutional neural networks. Sign are difficult to annotate and thus hardly any publicly available annotations exist. As such, this work exploits related information sources as weak supervision. Humans mainly look at face during communication, where play an important role constitute natural patterns large variability. However, most scientific research on still disregards face. Hardly...

10.1109/iccvw.2015.69 article EN 2015-12-01
Coming Soon ...