- Handwritten Text Recognition Techniques
- Image Processing and 3D Reconstruction
- Natural Language Processing Techniques
- Face recognition and analysis
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Vehicle License Plate Recognition
- Domain Adaptation and Few-Shot Learning
- Face and Expression Recognition
- Topic Modeling
- Machine Learning in Materials Science
- Human Pose and Action Recognition
- Video Analysis and Summarization
- Robotics and Sensor-Based Localization
- Advanced SAR Imaging Techniques
- Computational Drug Discovery Methods
- Biomedical Text Mining and Ontologies
- Video Surveillance and Tracking Methods
- Text and Document Classification Technologies
- Advanced Vision and Imaging
- Image Retrieval and Classification Techniques
- Visual Attention and Saliency Detection
- Advanced Chemical Sensor Technologies
- Advanced Image and Video Retrieval Techniques
- Mass Spectrometry Techniques and Applications
University of Science and Technology of China
2014-2023
LiDAR and Radar are two complementary sensing approaches in that specializes capturing an object's 3D shape while provides longer detection ranges as well velocity hints. Though seemingly natural, how to efficiently combine them for improved feature representation is still unclear. The main challenge arises from data extremely sparse lack height information. Therefore, directly integrating features into LiDAR-centric networks not optimal. In this work, we introduce a bi-directional...
Online Handwritten Text Recognition (OLHTR) has gained considerable attention for its diverse range of applications. Current approaches usually treat OLHTR as a sequence recognition task, employing either single trajectory or image encoder, multi-stream encoders, combined with CTC attention-based decoder. However, these face several drawbacks: 1) encoders typically focus on local trajectories visual regions, lacking the ability to dynamically capture relevant global features in challenging...
The primary objective of Optical Chemical Structure Recognition is to identify chemical structure images into corresponding markup sequences. However, the complex two-dimensional structures molecules, particularly those with rings and multiple branches, present significant challenges for current end-to-end methods learn one-dimensional directly. To overcome this limitation, we propose a novel Ring-Free Language (RFL), which utilizes divide-and-conquer strategy describe in hierarchical form....
Recently, visual-language learning has shown great potential in enhancing visual-based person re-identification (ReID). Existing learning-based ReID methods often focus on whole-body scale image-text feature alignment, while neglecting supervisions fine-grained part features. This choice simplifies the process but cannot guarantee within-part semantic consistency thus hindering final performance. Therefore, we propose to enhance visual features with part-informed language supervision for...
This paper presents a study of designing compact classifiers using deep neural networks for recognition online handwritten Chinese characters. Two schemes are investigated based on practical considerations. First, adopted purely as classifier with state-of-the-art feature extractor Second, the so-called bottleneck features extracted from layer fed to prototype-based classifier. The experiments an in-house developed handwriting corpus vocabulary 15,167 characters show that compared widely...
Recently, an effective segmentation-free approach via deep neural network based hidden Markov model (DNN-HMM) was proposed and successfully applied to offline handwritten Chinese text recognition. In this study, further improve the modeling capability, we adopt convolutional networks (DCNN) calculate HMM state posteriors. First, on frame basis, DCNN-HMM can automatically learn features from raw image of line architecture rather than handcrafted gradient using in DNN-HMM. Second, examine...
Satisfactory recognition performance has been achieved for simple and controllable printed molecular images. However, recognizing handwritten chemical structure images remains unresolved due to the inherent ambiguities in atoms bonds, as well signifcant challenge of converting projected 2D layouts into markup strings. Target address these problems, this paper proposes an end-to-end framework recognition, with novel structure-specific language (SSML) random conditional guided decoder (RCGD)....
Recently, recognition of handwritten mathematical expression has been greatly improved by employing sequence modeling methods such as encoder-decoder based methods. Existing models use string decoders or tree to generate markup recognition. String directly LaTeX strings and decode expressions into structures. The generalization is poor on with complex hierarchical structures, but its language model better. Tree can deal the weakened. In order take advantage above two decoders, we propose a...
Recently, heatmap regression methods based on 1D landmark representations have shown prominent performance locating facial landmarks. However, previous ignored to make deep explorations the good potentials of for sequential and structural modeling multiple landmarks track To address this limitation, we propose a Transformer architecture, namely 1DFormer, which learns informative by capturing dynamic geometric patterns via token communications in both temporal spatial dimensions tracking. For...
Recently, Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications document understanding. Current methods typically approach HMER as an image-to-sequence generation task within autoregressive (AR) encoder-decoder framework. However, these approaches suffer from several drawbacks: 1) a lack of overall language context, limiting information utilization beyond the current decoding step; 2) error accumulation...
The primary objective of Optical Chemical Structure Recognition is to identify chemical structure images into corresponding markup sequences. However, the complex two-dimensional structures molecules, particularly those with rings and multiple branches, present significant challenges for current end-to-end methods learn one-dimensional directly. To overcome this limitation, we propose a novel Ring-Free Language (RFL), which utilizes divide-and-conquer strategy describe in hierarchical form....
LiDAR and Radar are two complementary sensing approaches in that specializes capturing an object's 3D shape while provides longer detection ranges as well velocity hints. Though seemingly natural, how to efficiently combine them for improved feature representation is still unclear. The main challenge arises from data extremely sparse lack height information. Therefore, directly integrating features into LiDAR-centric networks not optimal. In this work, we introduce a bi-directional...
Bird's-Eye-View (BEV) based 3D visual perception, which formulates a unified space for multi-view representation, has received wide attention in autonomous driving due to its scalability downstream tasks. However, view transform transformer-based BEV methods is agnostic of occlusion relationships, resulting model degradation. To construct higher-quality space, this paper analyzes the mutual problems process and proposes new method named OccluBEV. OccluBEV alleviates issue via point cloud...
Recently, many researches propose to employ attention based encoder-decoder models convert a sequence of trajectory points into LaTeX string for online handwritten mathematical expression recognition (OHMER), and the performance these critically relies on accuracy attention. In this paper, unlike previous methods which basically soft model, we posterior modifies probabilities after observing output generated by model. order further improve mechanism, stroke average pooling layer aggregate...
Recently, heatmap regression methods based on 1D landmark representations have shown prominent performance locating facial landmarks. However, previous ignored to make deep explorations the good potentials of for sequential and structural modeling multiple landmarks track To address this limitation, we propose a Transformer architecture, namely 1DFormer, which learns informative by capturing dynamic geometric patterns via token communications in both temporal spatial dimensions tracking. For...
Recent works have shown huge success of deep learning models for common in vocabulary (IV) scene text recognition. However, real-world scenarios, out-of-vocabulary (OOV) words are great importance and SOTA recognition usually perform poorly on OOV settings. Inspired by the intuition that learned language prior limited preformence, we design a framework named Vision Language Adaptive Mutual Decoder (VLAMD) to tackle problems partly. VLAMD consists three main conponents. Firstly, build an...