- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Hand Gesture Recognition Systems
- Advanced Neural Network Applications
- Machine Learning and Data Classification
- Video Surveillance and Tracking Methods
- Video Analysis and Summarization
- Machine Learning in Materials Science
- Forecasting Techniques and Applications
- Advanced Vision and Imaging
- Gait Recognition and Analysis
- Neural Networks and Applications
- Time Series Analysis and Forecasting
- Stock Market Forecasting Methods
- Handwritten Text Recognition Techniques
- Hearing Impairment and Communication
- Human Motion and Animation
- Algorithms and Data Compression
- Image Retrieval and Classification Techniques
Tsinghua University
2015-2021
Beijing Academy of Artificial Intelligence
2019
This work develops a continuous sign language (SL) recognition framework with deep neural networks, which directly transcribes videos of SL sentences to sequences ordered gloss labels. Previous methods dealing usually employ hidden Markov models limited capacity capture the temporal information. In contrast, our proposed architecture adopts convolutional networks stacked fusion layers as feature extraction module, and bidirectional recurrent sequence learning module. We propose an iterative...
This work presents a weakly supervised framework with deep neural networks for vision-based continuous sign language recognition, where the ordered gloss labels but no exact temporal locations are available video of sentence, and amount labeled sentences training is limited. Our approach addresses mapping segments to glosses by introducing recurrent convolutional network spatio-temporal feature extraction sequence learning. We design three-stage optimization process our architecture. First,...
Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this paper, we propose an captioning system exploits parallel structures between our model, process generating next word, given previously generated ones, aligned visual perception experience where attention shifts among regions—such transitions impose a thread ordering in perception. This alignment...
Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this paper, we propose an caption system exploits parallel structures between our model, process generating next word, given previously generated ones, aligned visual perception experience where attention shifting among regions imposes a thread ordering. This alignment characterizes flow "abstract...
Models applied on real time response tasks, like click-through rate (CTR) prediction model, require high accuracy and rigorous time. Therefore, top-performing deep models of depth complexity are not well suited for these applications with the limitations inference In order to get neural networks better performance given limitations, we propose a universal framework that exploits booster net help train lightweight prediction. We dub whole process rocket launching, where is used guide learning...
Models applied on real time response task, like click-through rate (CTR) prediction model, require high accuracy and rigorous time. Therefore, top-performing deep models of depth complexity are not well suited for these applications with the limitations inference In order to further improve neural networks' performance given computational limitations, we propose an approach that exploits a cumbersome net help train lightweight prediction. We dub whole process rocket launching, where booster...
Generating videos with semantic meaning, such as gestures in sign language, is a challenging problem. The model should not only learn to generate realistic appearance, but also take notice of crucial details frames convey precise information. In this paper, we focus on the problem generating long-term gesture containing and complete meanings. We develop novel architecture temporal spatial transforms regions interest, i.e., gesticulating hands or face our case. adopt hierarchical approach for...
In this paper, we propose a real-time system of hand pose estimation that infers the and shows finger positions in each frame. The is designed through data driven methodology. For to perform real-time, employ retrieval method based on an inverted-file index with edge-based descriptors. To strengthen discriminability, combine robust orientation assignment A novel alignment transfer information from results recognized image. refine measurement similarity, mixed criterion considers both outline...
Multi-horizon time series forecasting plays an important role in many industrial and business decision processes. To grasp complex various patterns across different is the crucial step achieving promising performance. However, most deep learning-based approaches simply take series-specific static (i.e. time-invariant) covariates as input features, which can fail to capture pattern variation for each possible series. In this paper, we propose a novel multiplicative attention-based...
In most model-free tracking algorithms, context of the target is usually taken as source negative examples for training appearance model and thus not fully utilized. fact, embedded in its context, there are spatial constraints between them with potential motion correlations relative locations. This paper presents a part-based to describe object context. Auxiliary objects selected parts part based represent information. These auxiliary should have some properties: (i) co-occurrence target,...
Alignment is an important preprocessing step for image information retrieval. With template information, images should be aligned precisely to retrieve inside. These may contain repeated characters or radicals, and their textures not appropriate keypoint extractions. In this paper, a novel robust alignment algorithm proposed with text using clustering subspace learning. Experiment shows it can perform well even in extreme cases.