Jinshui Hu

ORCID: 0009-0001-3017-973X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Handwritten Text Recognition Techniques
  • Image Processing and 3D Reconstruction
  • Natural Language Processing Techniques
  • Face recognition and analysis
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Vehicle License Plate Recognition
  • Domain Adaptation and Few-Shot Learning
  • Face and Expression Recognition
  • Topic Modeling
  • Machine Learning in Materials Science
  • Human Pose and Action Recognition
  • Video Analysis and Summarization
  • Robotics and Sensor-Based Localization
  • Advanced SAR Imaging Techniques
  • Computational Drug Discovery Methods
  • Biomedical Text Mining and Ontologies
  • Video Surveillance and Tracking Methods
  • Text and Document Classification Technologies
  • Advanced Vision and Imaging
  • Image Retrieval and Classification Techniques
  • Visual Attention and Saliency Detection
  • Advanced Chemical Sensor Technologies
  • Advanced Image and Video Retrieval Techniques
  • Mass Spectrometry Techniques and Applications

University of Science and Technology of China
2014-2023

LiDAR and Radar are two complementary sensing approaches in that specializes capturing an object's 3D shape while provides longer detection ranges as well velocity hints. Though seemingly natural, how to efficiently combine them for improved feature representation is still unclear. The main challenge arises from data extremely sparse lack height information. Therefore, directly integrating features into LiDAR-centric networks not optimal. In this work, we introduce a bi-directional...

10.1109/cvpr52729.2023.01287 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Online Handwritten Text Recognition (OLHTR) has gained considerable attention for its diverse range of applications. Current approaches usually treat OLHTR as a sequence recognition task, employing either single trajectory or image encoder, multi-stream encoders, combined with CTC attention-based decoder. However, these face several drawbacks: 1) encoders typically focus on local trajectories visual regions, lacking the ability to dynamically capture relevant global features in challenging...

10.48550/arxiv.2502.06100 preprint EN arXiv (Cornell University) 2025-02-09

10.1109/icassp49660.2025.10887682 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/icassp49660.2025.10888044 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

The primary objective of Optical Chemical Structure Recognition is to identify chemical structure images into corresponding markup sequences. However, the complex two-dimensional structures molecules, particularly those with rings and multiple branches, present significant challenges for current end-to-end methods learn one-dimensional directly. To overcome this limitation, we propose a novel Ring-Free Language (RFL), which utilizes divide-and-conquer strategy describe in hierarchical form....

10.1609/aaai.v39i2.32197 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Recently, visual-language learning has shown great potential in enhancing visual-based person re-identification (ReID). Existing learning-based ReID methods often focus on whole-body scale image-text feature alignment, while neglecting supervisions fine-grained part features. This choice simplifies the process but cannot guarantee within-part semantic consistency thus hindering final performance. Therefore, we propose to enhance visual features with part-informed language supervision for...

10.48550/arxiv.2308.02738 preprint EN other-oa arXiv (Cornell University) 2023-01-01

This paper presents a study of designing compact classifiers using deep neural networks for recognition online handwritten Chinese characters. Two schemes are investigated based on practical considerations. First, adopted purely as classifier with state-of-the-art feature extractor Second, the so-called bottleneck features extracted from layer fed to prototype-based classifier. The experiments an in-house developed handwriting corpus vocabulary 15,167 characters show that compared widely...

10.1109/icpr.2014.508 article EN 2014-08-01

Recently, an effective segmentation-free approach via deep neural network based hidden Markov model (DNN-HMM) was proposed and successfully applied to offline handwritten Chinese text recognition. In this study, further improve the modeling capability, we adopt convolutional networks (DCNN) calculate HMM state posteriors. First, on frame basis, DCNN-HMM can automatically learn features from raw image of line architecture rather than handcrafted gradient using in DNN-HMM. Second, examine...

10.1109/acpr.2017.65 article EN 2017-11-01

Satisfactory recognition performance has been achieved for simple and controllable printed molecular images. However, recognizing handwritten chemical structure images remains unresolved due to the inherent ambiguities in atoms bonds, as well signifcant challenge of converting projected 2D layouts into markup strings. Target address these problems, this paper proposes an end-to-end framework recognition, with novel structure-specific language (SSML) random conditional guided decoder (RCGD)....

10.1145/3581783.3612573 article EN 2023-10-26

Recently, recognition of handwritten mathematical expression has been greatly improved by employing sequence modeling methods such as encoder-decoder based methods. Existing models use string decoders or tree to generate markup recognition. String directly LaTeX strings and decode expressions into structures. The generalization is poor on with complex hierarchical structures, but its language model better. Tree can deal the weakened. In order take advantage above two decoders, we propose a...

10.1109/icpr56361.2022.9956105 article EN 2022 26th International Conference on Pattern Recognition (ICPR) 2022-08-21

Recently, heatmap regression methods based on 1D landmark representations have shown prominent performance locating facial landmarks. However, previous ignored to make deep explorations the good potentials of for sequential and structural modeling multiple landmarks track To address this limitation, we propose a Transformer architecture, namely 1DFormer, which learns informative by capturing dynamic geometric patterns via token communications in both temporal spatial dimensions tracking. For...

10.24963/ijcai.2024/176 article EN 2024-07-26

Recently, Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications document understanding. Current methods typically approach HMER as an image-to-sequence generation task within autoregressive (AR) encoder-decoder framework. However, these approaches suffer from several drawbacks: 1) a lack of overall language context, limiting information utilization beyond the current decoding step; 2) error accumulation...

10.48550/arxiv.2407.11380 preprint EN arXiv (Cornell University) 2024-07-16

The primary objective of Optical Chemical Structure Recognition is to identify chemical structure images into corresponding markup sequences. However, the complex two-dimensional structures molecules, particularly those with rings and multiple branches, present significant challenges for current end-to-end methods learn one-dimensional directly. To overcome this limitation, we propose a novel Ring-Free Language (RFL), which utilizes divide-and-conquer strategy describe in hierarchical form....

10.48550/arxiv.2412.07594 preprint EN arXiv (Cornell University) 2024-12-10

LiDAR and Radar are two complementary sensing approaches in that specializes capturing an object's 3D shape while provides longer detection ranges as well velocity hints. Though seemingly natural, how to efficiently combine them for improved feature representation is still unclear. The main challenge arises from data extremely sparse lack height information. Therefore, directly integrating features into LiDAR-centric networks not optimal. In this work, we introduce a bi-directional...

10.48550/arxiv.2306.01438 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Bird's-Eye-View (BEV) based 3D visual perception, which formulates a unified space for multi-view representation, has received wide attention in autonomous driving due to its scalability downstream tasks. However, view transform transformer-based BEV methods is agnostic of occlusion relationships, resulting model degradation. To construct higher-quality space, this paper analyzes the mutual problems process and proposes new method named OccluBEV. OccluBEV alleviates issue via point cloud...

10.1145/3581783.3613798 article EN 2023-10-26

Recently, many researches propose to employ attention based encoder-decoder models convert a sequence of trajectory points into LaTeX string for online handwritten mathematical expression recognition (OHMER), and the performance these critically relies on accuracy attention. In this paper, unlike previous methods which basically soft model, we posterior modifies probabilities after observing output generated by model. order further improve mechanism, stroke average pooling layer aggregate...

10.1109/icpr48806.2021.9412790 article EN 2022 26th International Conference on Pattern Recognition (ICPR) 2021-01-10

Recently, heatmap regression methods based on 1D landmark representations have shown prominent performance locating facial landmarks. However, previous ignored to make deep explorations the good potentials of for sequential and structural modeling multiple landmarks track To address this limitation, we propose a Transformer architecture, namely 1DFormer, which learns informative by capturing dynamic geometric patterns via token communications in both temporal spatial dimensions tracking. For...

10.48550/arxiv.2311.00241 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Recent works have shown huge success of deep learning models for common in vocabulary (IV) scene text recognition. However, real-world scenarios, out-of-vocabulary (OOV) words are great importance and SOTA recognition usually perform poorly on OOV settings. Inspired by the intuition that learned language prior limited preformence, we design a framework named Vision Language Adaptive Mutual Decoder (VLAMD) to tackle problems partly. VLAMD consists three main conponents. Firstly, build an...

10.48550/arxiv.2209.00859 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...