- Speech and Audio Processing
- Face recognition and analysis
- Advanced Image Fusion Techniques
- Image Retrieval and Classification Techniques
- Video Analysis and Summarization
- Face and Expression Recognition
- Image and Signal Denoising Methods
- Music and Audio Processing
- Medical Image Segmentation Techniques
- Image and Video Quality Assessment
- Digital Image Processing Techniques
- Advanced Vision and Imaging
- Human Pose and Action Recognition
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Visual Attention and Saliency Detection
- Generative Adversarial Networks and Image Synthesis
- Retinal Imaging and Analysis
- Video Surveillance and Tracking Methods
- Sparse and Compressive Sensing Techniques
- Ophthalmology and Visual Impairment Studies
- Image and Object Detection Techniques
- Psychological Well-being and Life Satisfaction
- Speech and dialogue systems
- Sports Dynamics and Biomechanics
Queen Mary University of London
2024
Guiyang Medical University
2024
University of California, Los Angeles
2023
Wuhan University of Science and Technology
2021
Xidian University
2016-2019
Xi'an University of Science and Technology
2019
University of Oulu
2010-2017
Sichuan University
2009-2014
State Key Laboratory of Biotherapy
2014
University of Kent
2010
A practical lipreading system can be considered either as subject dependent (SD) or subject-independent (SI). An SD is user-specific, i.e., customized for some particular user while an SI has to cope with a large number of users. These two types systems pose variant challenges and have treated differently. In this paper, we propose simple deterministic model tackle the problem. The first seeks low-dimensional manifold where visual features extracted from frames video projected onto...
Visual speech constitutes a large part of our nonrigid facial motion and contains important information that allows machines to interact with human users, for instance, through automatic visual recognition (VSR) speaker verification. One the major obstacles research non-rigid mouth analysis is absence suitable databases. Those available public either lack sufficient number speakers or utterances contain constrained view points, which limits their representativeness usefulness. This paper...
The problem of visual speech recognition involves the decoding video dynamics a talking mouth in high-dimensional space. In this paper, we propose generative latent variable model to provide compact representation data. uses variables separately represent interspeaker variations appearances and those caused by uttering within images, incorporates structural information data through placing priors along curve embedded path graph.
Group sparsity has shown great potential in various low-level vision tasks (e.g, image denoising, deblurring and inpainting). In this paper, we propose a new prior model for denoising via group residual constraint (GSRC). To enhance the performance of sparse-based concept is proposed, thus, problem translated into one that reduces residual. reduce residual, first obtain some good estimation sparse coefficients original by first-pass noisy image, then centralize to estimation. Experimental...
Understanding the continuous states of objects is essential for task learning and planning in real world. However, most existing benchmarks assume discrete (e.g., binary) object goal states, which poses challenges complex tasks transferring learned policy from simulated environments to Furthermore, state discretization limits a robot's ability follow human instructions based on grounding actions states. To tackle these challenges, we present ARNOLD, benchmark that evaluates language-grounded...
An image-based visual speech animation system is presented in this paper. A video model proposed to preserve the dynamics of a talking face. The represents sequence by low-dimensional continuous curve embedded path graph and establishes map from image domain. When selecting segments for synthesis, we loosen traditional requirement using triphone as unit allow contain longer natural motion. Dense videos are sampled segments, concatenated, downsampled train that enables efficient time...
Video texture synthesis is the process of providing a continuous and infinitely varying stream frames, which plays an important role in computer vision graphics. However, it still remains challenging problem to generate high-quality results. Considering two key factors that affect performance, frame representation blending artifacts, we improve performance from aspects: 1) Effective designed capture both image appearance information spatial domain longitudinal temporal domain. 2) Artifacts...
Extracting full-body motion of walking people from monocular video sequences in complex, real-world environments is an important and difficult problem, going beyond simple tracking, whose satisfactory solution demands appropriate balance between use prior knowledge learning data. We propose a consistent Bayesian framework for introducing strong into system extracting human gait. In this work, the built articulated model having both time-invariant (static) time-variant (dynamic) parameters....
The nature of the morphological skeleton representation a binary shape is related to composition structuring elements through distance function defined by set transforms in digital space. Two metrics, uniform-step and periodically-uniform-step distance, are introduced provide useful spatial measures for transforms. A natural ribbonlike components accomplished extraction skeletal feature primitives from shape. hierarchical structure makes it stable insensitive noise disturbance. matching...
In this paper, we propose a novel graph embedding method for the problem of lipreading. To characterize temporal connections among video frames same utterance, new distance metric is defined on pair and graphs are constructed to represent dynamics based distances between frames. Audio information used assist in calculating such distances. For each subspace visual feature space learned from well-defined intrinsic penalty within graph-embedding framework. Video found be well preserved along...
The inhibitor of apoptosis family member livin is expressed in several types cancer but not most benign tissues, and it has been considered to be a poor prognostic mark various malignancies. However, expression its relevance have evaluated colorectal adenoma-carcinoma sequence. In this study, we analyzed the difference among normal mucosa, adenoma, adenocarcinoma investigated relationship carcinomas with clinicopathological variables using immunohistochemistry real-time reverse...
A novel image coding scheme is proposed and studied, in which fine spatial features are separated from an encoded using directional decomposition-based techniques. Morphological techniques used for multiresolution feature decomposition filtering order to preserve the integrity of features. The experimental results a preliminary study show that high compression ratio can be achieved with good reconstruction quality.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML"...
The authors present a generalized morphological skeleton transform (MST) algorithm for rotation-invariant vision applications. Several subclasses of MST are derived from the by using appropriate sets structuring elements (SE). rotational property an is determined circularity its SEs. A pseudo-Euclidean transform, which uses quasi-circular SEs composed through dilation interpolation procedure, proposed. Experimental results show that gives best performance among existing subclasses. These can...
Characterizing subtle facial movements from videos is one of the most intensive topics in computer vision research. It is, however, challenging, since (1) intensity muscle movement usually low, (2) duration may be transient, and (3) datasets containing spontaneous with reliable annotations are painful to obtain often small sizes. This article targeted at addressing these problems for characterizing both aspects motion elucidation description. First, we propose an efficient method elucidating...
Two premarking methods are proposed for a new 3D object recognition system under development at the University of Toronto. In this system, an is modeled using only small number 2D distinct perspective views (standard views) predefined wit help markers placed on object. During process, standard view acquired by first determining its surface normal (standard-view axis), and then aligning camera's optical axis with it. Standard-view axes obtained analyzing images markers. A morphological...
In this paper we describe the first version of our system for estimating 3D shape sequences from images frontal face. This approach is developed with Visual Speech Animation (VSA) as target application. particular, focus on usability an existing state-of-the-art image-based VSA and subsequent on-line estimation corresponding facial sequence its output. has added advantage a visual speech, which mainly render ability face in different poses illumination conditions. The idea based detection...
The feasibility of a pre-marking scheme for three-dimensional object recognition is demonstrated. proposed based on the assumption that an can be modeled by small number its distinct two-dimensional perspective projections. Circular markers are used to identify these views determining their surface normals passing through centers. normal marker determined analyzing geometrical features acquired pseudo-ellipse image using morphological skeleton transforms. position marker, other hand, has...
This paper presents a visually realistic animation system for synthesizing talking mouth. Video synthesis is achieved by first learning generative models from the recorded speech videos and then using learned to generate novel utterances. A model considers whole utterance contained in video as continuous process represents it set of trigonometric functions embedded within path graph. The transformation that projects values image space found through graph embedding. Such allows us synthesize...
Existing blind image quality assessment (BIQA) methods based on statistics attach limited attention to the relative position of pixels. Features in these BIQA are too flimsy characterize quite a few distortions with strong locality or complexity. However, psychological studies have shown that according within visual field, cognitive system generates visuo-spatial serial memory used for tasks, e.g., subjective assessment. Inspired by series generated human (HVS), we propose method imitation...