- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Video Surveillance and Tracking Methods
- Advanced Vision and Imaging
- Linguistics and Cultural Studies
- Advanced Image and Video Retrieval Techniques
- Anomaly Detection Techniques and Applications
- Topic Modeling
- Hand Gesture Recognition Systems
- Gait Recognition and Analysis
- Video Analysis and Summarization
- Natural Language Processing Techniques
- Advanced Software Engineering Methodologies
- Robotics and Sensor-Based Localization
- China's Ethnic Minorities and Relations
- Embedded Systems and FPGA Design
- Model-Driven Software Engineering Techniques
- Laser Material Processing Techniques
- Chaos-based Image/Signal Encryption
- Speech and dialogue systems
- Machine Learning in Bioinformatics
- Image Retrieval and Classification Techniques
- Digital Storytelling and Education
- Linguistics and language evolution
- Advanced Image Fusion Techniques
China University of Mining and Technology
2025
Northeastern University
2014-2024
University of Ottawa
2009-2016
Beihang University
2013-2015
Shanghai Electric (China)
2014
Huazhong Agricultural University
2012
Nanchang Institute of Technology
2011
Loughborough University
2004-2005
Vanderbilt University
2004-2005
Shenyang University
2005
Local spatio-temporal features and bag-of-features representations have become popular for action recognition. A recent trend is to use dense sampling better performance. While many methods claimed feature sets, most of them are just denser than approaches based on sparse interest point detectors. In this paper, we explore with high density We also investigate the impact random over grid computational efficiency. present a real-time recognition system which integrates fast method local...
For the problem of action detection, most existing methods require that relevant portions interest in training videos have been manually annotated with bounding boxes. Some recent works tried to avoid tedious manual annotation , and proposed automatically identify videos. However, these only concerned identification either spatial or temporal domain, may get irrelevant contents from another domain. These are usually undesirable phase, which will lead a degradation detection performance. This...
Neural machine translation has shown very promising results lately. Most NMT models follow the encoder-decoder framework. To make more flexible, attention mechanism was introduced to and also other tasks like speech recognition image captioning. We observe that quality of by attention-based can be significantly damaged when alignment is incorrect. attribute these problems lack distortion fertility models. Aiming resolve problems, we propose new variations compare them with on translation....
Due to occlusions and objects' non-rigid deformation in the scene, obtained motion trajectories from common trackers may contain a number of missing or mis-associated entries. To cluster such corrupted point based into multiple motions is still hard problem. In this paper, we present an approach that exploits temporal spatial characteristics tracked points facilitate segmentation incomplete trajectories, thereby obtain highly robust results against severe data noises. Our method first uses...
The implementation of computer based systems (CBSs) is commonly guided by constraints imposed the particular domain CBS. Domain-specific programming a convenient way to provide expert with language that customized and assumptions domain.. careful thought design precede development any domain-specific visual restrict programmer from illegal formalisms, allow for rapid determination validity "program". Usually, designed produced using metamodel some sort. Occasionally, similar domains can...
This paper introduces a high efficient local spatiotemporal descriptor, called gradient boundary histograms (GBH). The proposed GBH descriptor is built on simple spatio-temporal gradients, which are fast to compute. We demonstrate that it can better represent structure and motion than other gradient-based descriptors, significantly outperforms them large realistic datasets. A comprehensive evaluation shows the recognition accuracy preserved while spatial resolution greatly reduced, yields...
In this paper, we present a part model for human action recognition from video. We use 3D HOG descriptor and bag-of-feature to represent To overcome the unordered events of approach, propose novel multiscale local preserve temporal context. Our method builds upon several recent ideas including dense sampling, spatial-temporal (ST) features, descriptor, BOF representation non-linear SVMs. The preliminary results on KTH dataset show higher rate than studies.
This paper presents a new mechatronic approach of using infrared thermography combined with image processing for the quality control laser sealing process food containers. The suggested uses an online system to assess heat distribution within container seal in order guarantee integrity process. Visual is then used assurance optimum sealing. results described this show examples capability condition monitoring detect faults found indicate that could form effective and system.
Current methods to interpret deep learning models by generating saliency maps generally rely on two key assumptions. First, they use first-order approximations of the loss function neglecting higher-order terms such as curvatures. Second, evaluate each feature's importance in isolation, ignoring their inter-dependencies. In this work, we study effect relaxing these characterizing a closed-form formula for Hessian matrix ReLU network, prove that, classification problem with large number...
Non-rigid structure-from-motion (NRSfM) is the process of recovering time-varying 3D structures and poses a deformable object from an uncalibrated monocular video sequence. Currently, most NRSfM algorithms utilize non- degenerate assumption for non-rigid deformations whereby can be assumed to linear combination basis shapes with full rank three. Unfortunately, this will produce extra degrees-of-freedom when has some shape bases less than These yield spurious due non-negligible noise in real...
This paper discusses the use of Bag-of-Features and a local part model approach for bare hand dynamic gesture recognition from video. We used dense sampling to extract 3D multiscale whole-part features. adopted three dimensional histograms gradient orientation (3D HOG) descriptor represent K-means++ method has applied cluster visual words. Dynamic classification was completed by using Bag-of-features (BOF) non-linear support vector machine (SVM) method. A BOF do not track order events. To...
Traditional digital watermarking algorithm based on DWT generally embedded in high frequency watermarks; these bands of wavelet coefficients are lower and they vulnerable when attacked by different kinds pictures so that difficult to deal with some strong attacks against algorithms such as damage compression, filtering on. It results the robustness always can't satisfy requirements practical application. The part low-frequency after using Wavelet Transformation embeds watermark information...
To acquire seamless visualization of environments from different viewing positions and orientations, it is desirable to generate virtual images for an arbitrary position given a set reference views. In this paper, simple interpolation method based on ray-tracing proposed viewpoint synthesis panoramas taken with multi-sensor cameras. Instead attempting recover dense 3D reconstruction the scene, estimates pose between each panorama then backward projects point along ray that exhibits best...
This paper describes a mechatronic approach to the design and implementation of non-contact sealing system for multilayer lidding films, as used in aseptic containers food medical packaging. The method proposed uses beam-steered laser seal product, thereby enabling virtually instant changeover from one product line another while reducing machine tooling costs downtime. Results are presented which show that process may produce seals higher strength than conventional thermal/mechanical process.
The conventional area-based stereo matching algorithm suffers from two problems, the windowing problem and computational cost. Multiple scale analysis has long been adopted in vision research. Investigation of wavelet transform suggests that -- dilated basis functions provide changeable window areas associated with signal frequency components hierarchically represent signals multiresolution structure. This paper discusses advantages applying transforms to weakness Mallat’s analysis....
Visual modeling languages are often used today in engineering domains, Mathworks' Simulink/Stateflow for simulation, signal processing and controls being the prime example. However, they also becoming suitable implementing other computational tasks, like model transformations. In this paper we briefly introduce GReAT: a visual language with simple, yet powerful semantics transformations on attributed, typed hypergraphs help of explicitly sequenced graph transformation rules. The main...