- Face recognition and analysis
- Face and Expression Recognition
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Speech and Audio Processing
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Multimodal Machine Learning Applications
- Biometric Identification and Security
- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- Music and Audio Processing
- Anomaly Detection Techniques and Applications
- Emotion and Mood Recognition
- Facial Nerve Paralysis Treatment and Research
- Speech Recognition and Synthesis
- Tensor decomposition and applications
- Image Processing Techniques and Applications
- Machine Learning and Data Classification
- 3D Shape Modeling and Analysis
- Image and Signal Denoising Methods
- Image Retrieval and Classification Techniques
- Remote-Sensing Image Classification
Queen Mary University of London
2020-2024
Samsung (South Korea)
2020-2024
Samsung (United Kingdom)
2019-2024
University of Nottingham
2015-2020
Samsung (United States)
2019-2020
University of Leeds
2018
University of Bristol
2018
University of Oxford
2018
University of Lincoln
2012-2015
Imperial College London
2007-2014
This paper investigates how far a very deep neural network is from attaining close to saturating performance on existing 2D and 3D face alignment datasets. To this end, we make the following 5 contributions: (a) construct, for first time, strong baseline by combining state-of-the-art architecture landmark localization with residual block, train it large yet synthetically expanded facial dataset finally evaluate all other (b)We create guided landmarks which converts annotations unifies...
Automatic facial point detection plays arguably the most important role in face analysis. Several methods have been proposed which reported their results on databases of both constrained and unconstrained conditions. Most these provide annotations with different mark-ups some cases are problems related to accuracy fiducial points. The aforementioned issues as well lack a evaluation protocol makes it difficult compare performance between systems. In this paper, we present 300 Faces...
3D face reconstruction is a fundamental Computer Vision problem of extraordinary difficulty. Current systems often assume the availability multiple facial images (sometimes from same subject) as input, and must address number methodological challenges such establishing dense correspondences across large poses, expressions, non-uniform illumination. In general these methods require complex inefficient pipelines for model building fitting. this work, we propose to many limitations by training...
Developing powerful deformable face models requires massive, annotated databases on which techniques can be trained, validated and tested. Manual annotation of each facial image in terms landmarks a trained expert the workload is usually enormous. Fatigue one reasons that some cases annotations are inaccurate. This why, majority existing provide for relatively small subset training images. Furthermore, there hardly any correspondence between land-marks across different databases. These...
This paper addresses 2 challenging tasks: improving the quality of low resolution facial images and accurately locating landmarks on such poor images. To this end, we make following 5 contributions: (a) propose Super-FAN: very first end-to-end system that both tasks simultaneously, i.e. improves face detects landmarks. The novelty or Super-FAN lies in incorporating structural information a GAN-based super-resolution algorithm via integrating sub-network for alignment through heatmap...
Abstract In plant phenotyping, it has become important to be able measure many features on large image sets in order aid genetic discovery. The size of the datasets, now often captured robotically, precludes manual inspection, hence motivation for finding a fully automated approach. Deep learning is an emerging field that promises unparalleled results data analysis problems. Building artificial neural networks, deep approaches have more hidden layers network, and greater discriminative...
Detection and tracking of faces in image sequences is among the most well studied problems intersection statistical machine learning computer vision. Often, detection methodologies use a rigid representation to describe facial region 1, hence they can neither capture nor exploit non-rigid deformations, which are crucial for countless applications (e.g., expression analysis, motion capture, high-performance face recognition etc.). Usually, deformations captured by locating position set...
We propose an end-to-end deep learning architecture for wordlevel visual speech recognition.The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks.We trained evaluated it on the Lipreading In-The-Wild benchmark, challenging database 500-size vocabulary consisting video excerpts from BBC TV broadcasts.The proposed network attains word accuracy equal to 83.0%, yielding 6.8% absolute improvement over current state-of-the-art.
Several end-to-end deep learning approaches have been recently presented which extract either audio or visual features from the input images signals and perform speech recognition. However, research on audiovisual models is very limited. In this work, we present an model based residual networks Bidirectional Gated Recurrent Units (BGRUs). To best of our knowledge, first fusion simultaneously learns to directly image pixels waveforms performs within-context word recognition a large publicly...
Cascaded regression approaches have been recently shown to achieve state-of-the-art performance for many computer vision tasks. Beyond its connection boosting, cascaded has interpreted as a learning-based approach iterative optimization methods like the Newton's method. However, in prior work, theory is limited only learning mapping from image features problem parameters. In this paper, we consider of facial deformable model fitting using and make following contributions: (a) We propose...
Arguably, Deformable Part Models (DPMs) are one of the most prominent approaches for face alignment with impressive results being recently reported both controlled lab and unconstrained settings. Fitting in DPM methods is typically formulated as a two-step process during which discriminatively trained part templates first correlated image to yield filter response each landmark then shape optimization performed over these responses. This process, although computationally efficient, based on...
Our goal is to design architectures that retain the groundbreaking performance of CNNs for landmark localization and at same time are lightweight, compact suitable applications with limited computational resources. To this end, we make following contributions: (a) first study effect neural network binarization on tasks, namely human pose estimation face alignment. We exhaustively evaluate various choices, identify bottlenecks, more importantly propose multiple orthogonal ways boost...
We describe a very simple framework for deriving the most-well known optimization problems in Active Appearance Models (AAMs), and most importantly providing efficient solutions. Our formulation results two fast exact AAM fitting, one new algorithm which has important advantage of being applicable to 3D. show that dominant cost both forward inverse algorithms is few times mN projecting an image onto appearance subspace. This makes not only computationally realizable but also attractive...
We introduce the notion of subspace learning from image gradient orientations for appearance-based object recognition. As data are typically noisy and noise is substantially different Gaussian, traditional pixel intensities very often fails to estimate reliably low-dimensional a given population. show that replacing with ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> norm cosine-based distance measure offers, some extend, remedy this...
We present a robust FFT-based approach to scale-invariant image registration. Our method relies on correlation twice: once in the log-polar Fourier domain estimate scaling and rotation spatial recover residual translation. Previous methods based same principles are not robust. To equip our scheme with robustness accuracy, we introduce modifications which tailor nature of images. First, derive efficient representations by replacing functions complex gray-level edge maps. show that this...
Recent works in speech recognition rely either on connectionist temporal classification (CTC) or sequence-to-sequence models for character-level recognition. CTC assumes conditional independence of individual characters, whereas attention-based can provide nonsequential alignments. Therefore, we could use a loss combination with an model order to force monotonic alignments and at the same time get rid assumption. In this paper, recently proposed hybrid CTC/attention architecture audio-visual...
This paper proposes an improved training algorithm for binary neural networks in which both weights and activations are numbers. A key but fairly overlooked feature of the current state-of-the-art method XNOR-Net is use analytically calculated real-valued scaling factors re-weighting output convolutions. We argue that analytic calculation these sub-optimal. Instead, this work, we make following contributions: (a) propose to fuse activation weight into a single one learned discriminatively...
Lucas-Kanade and active appearance models are among the most commonly used methods for image alignment facial fitting, respectively. They both utilize nonlinear gradient descent, which is usually applied on intensity values. In this paper, we propose employment of highly descriptive, densely sampled features problems. We show that strategy warping multichannel dense feature at each iteration more beneficial than extracting after iteration. Motivated by observation, demonstrate robust...
We present the 2016 ChaLearn Looking at People and Faces of World Challenge Workshop, which ran three competitions on common theme face analysis from still images. The first one, People, addressed age estimation, while second third competitions, World, accessory classification smile gender classification, respectively. two crowd-sourcing methodologies used to collect manual annotations. A custom-build application was label data about apparent people (as opposed real age). For data,...