- Handwritten Text Recognition Techniques
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Gait Recognition and Analysis
- Image Retrieval and Classification Techniques
- Radiomics and Machine Learning in Medical Imaging
- Lung Cancer Diagnosis and Treatment
- Anomaly Detection Techniques and Applications
- Natural Language Processing Techniques
- Advanced X-ray and CT Imaging
- Video Analysis and Summarization
- Speech and dialogue systems
- Context-Aware Activity Recognition Systems
- Advanced Neural Network Applications
- Hand Gesture Recognition Systems
- Image Processing and 3D Reconstruction
- Topic Modeling
- Vehicle License Plate Recognition
- Image and Object Detection Techniques
Sichuan University of Science and Engineering
2024-2025
Yibin University
2024
South China University of Technology
2019-2021
Video-based human action recognition is one of the most important and challenging areas research in field computer vision. Human has found many pragmatic applications video surveillance, human-computer interaction, entertainment, autonomous driving, etc. Owing to recent development deep learning methods for recognition, performance significantly enhanced datasets. Deep techniques are mainly used recognizing actions images videos comprising Euclidean data. A extension these non-Euclidean data...
Visual Information Extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into several independent sub-tasks of text spotting (text detection recognition) information extraction, which completely ignored the high correlation among them during optimization. In paper, we propose a robust System (VIES) towards real-world...
Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e.g., invoices and purchase receipts). Most previous methods treat the VIE simply as a sequence labeling problem or classification problem, which requires models carefully identify each kind of semantics by introducing multimodal features, such font, color, layout. But features can't work well when faced with numeric semantic categories some ambiguous texts. To address this issue, in...
Human skeleton contains significant information about actions, therefore, it is quite intuitive to incorporate skeletons in human action recognition. resembles a graph where body joints and bones mimic nodes edges. This resemblance of structure the main motivation apply convolutional neural network for Results show that discriminant contribution different not equal actions. Therefore, we propose use attention-joints correspond significantly contributing specific Features corresponding only...
Abstract Background The accurate classification of lung nodules is critical to achieving personalized cancer treatment and prognosis prediction. options for the patients are closely related type nodules, but there many types distinctions between certain subtle, making based on traditional medical imaging technology doctor experience challenging. Purpose In this study, a novel method was used analyze quantitative features in CT images using radiomics reveal characteristics pulmonary then...
Recently, deep learning has greatly promoted the performance of license plate recognition (LPR) by robust features from numerous labeled data. However, large variation wild plates across complicated environments and perspectives is still a huge challenge to LPR. To solve problem, we propose an effective efficient shared adversarial training network (SATN) in this paper, which can learn environment-independent perspective-free semantic with prior knowledge standard stencil-rendered plates, as...
Visual information extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into several independent sub-tasks of text spotting (text detection recognition) extraction, which completely ignored the high correlation among them during optimization. In paper, we propose a robust visual system (VIES) towards real-world scenarios,...
Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results plain texts and then utilized token-level category annotations as supervision to train a sequence tagging model. However, it expends great annotation costs may be exposed label confusion, the OCR errors will also significantly affect final performance. In this paper, we propose unified weakly-supervised learning...
Action recognition has been achieved great progress in recent years because of better feature representation learning and classification technology like convolutional neural networks (CNNs). However, most current deep approaches treat the action as a black box, ignoring specific domain knowledge itself. In this paper, by analyzing characteristics different actions, we proposed new framework that involves residual-attention module joint path-signature (JPSF) framework. The path signature...
Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently. Existing benchmarks have highlighted impressive performance LMMs in text recognition; however, their abilities on certain challenging tasks, such as localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently most...
The comprehension of text-rich visual scenes has become a focal point for evaluating Multi-modal Large Language Models (MLLMs) due to their widespread applications. Current benchmarks tailored the scenario emphasize perceptual capabilities, while overlooking assessment cognitive abilities. To address this limitation, we introduce Multimodal benchmark towards Text-rich scenes, evaluate Cognitive capabilities MLLMs through reasoning and content-creation tasks (MCTBench). mitigate potential...
The accurate classification of lung nodules is critical to achieving personalized cancer treatment and prognosis prediction. options for the patients are closely related type nodules, but there many types distinctions between certain subtle, making based on traditional medical imaging technology doctor experience challenging. This study adopts a novel approach, using computed tomography (CT) radiomics analyze quantitative features in CT images reveal characteristics then employs...
Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e.g., invoices and purchase receipts). Most previous methods treat the VIE simply as a sequence labeling problem or classification problem, which requires models carefully identify each kind of semantics by introducing multimodal features, such font, color, layout. But features couldn't work well when faced with numeric semantic categories some ambiguous texts. To address this issue,...
Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results into plain texts and then utilized token-level entity annotations as supervision to train a sequence tagging model. However, it expends great annotation costs may be exposed label confusion, the OCR errors will also significantly affect final performance. In this paper, we propose unified weakly-supervised learning...