Guozhi Tang

ORCID: 0000-0003-0859-5195
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Handwritten Text Recognition Techniques
  • Advanced Image and Video Retrieval Techniques
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Gait Recognition and Analysis
  • Image Retrieval and Classification Techniques
  • Radiomics and Machine Learning in Medical Imaging
  • Lung Cancer Diagnosis and Treatment
  • Anomaly Detection Techniques and Applications
  • Natural Language Processing Techniques
  • Advanced X-ray and CT Imaging
  • Video Analysis and Summarization
  • Speech and dialogue systems
  • Context-Aware Activity Recognition Systems
  • Advanced Neural Network Applications
  • Hand Gesture Recognition Systems
  • Image Processing and 3D Reconstruction
  • Topic Modeling
  • Vehicle License Plate Recognition
  • Image and Object Detection Techniques

Sichuan University of Science and Engineering
2024-2025

Yibin University
2024

South China University of Technology
2019-2021

Video-based human action recognition is one of the most important and challenging areas research in field computer vision. Human has found many pragmatic applications video surveillance, human-computer interaction, entertainment, autonomous driving, etc. Owing to recent development deep learning methods for recognition, performance significantly enhanced datasets. Deep techniques are mainly used recognizing actions images videos comprising Euclidean data. A extension these non-Euclidean data...

10.1109/tai.2021.3076974 article EN IEEE Transactions on Artificial Intelligence 2021-04-01

Visual Information Extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into several independent sub-tasks of text spotting (text detection recognition) information extraction, which completely ignored the high correlation among them during optimization. In paper, we propose a robust System (VIES) towards real-world...

10.1609/aaai.v35i4.16378 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e.g., invoices and purchase receipts). Most previous methods treat the VIE simply as a sequence labeling problem or classification problem, which requires models carefully identify each kind of semantics by introducing multimodal features, such font, color, layout. But features can't work well when faced with numeric semantic categories some ambiguous texts. To address this issue, in...

10.24963/ijcai.2021/144 article EN 2021-08-01

Human skeleton contains significant information about actions, therefore, it is quite intuitive to incorporate skeletons in human action recognition. resembles a graph where body joints and bones mimic nodes edges. This resemblance of structure the main motivation apply convolutional neural network for Results show that discriminant contribution different not equal actions. Therefore, we propose use attention-joints correspond significantly contributing specific Features corresponding only...

10.1109/access.2019.2961770 article EN cc-by IEEE Access 2019-12-23

Abstract Background The accurate classification of lung nodules is critical to achieving personalized cancer treatment and prognosis prediction. options for the patients are closely related type nodules, but there many types distinctions between certain subtle, making based on traditional medical imaging technology doctor experience challenging. Purpose In this study, a novel method was used analyze quantitative features in CT images using radiomics reveal characteristics pulmonary then...

10.1002/mp.17901 article EN Medical Physics 2025-05-20

Recently, deep learning has greatly promoted the performance of license plate recognition (LPR) by robust features from numerous labeled data. However, large variation wild plates across complicated environments and perspectives is still a huge challenge to LPR. To solve problem, we propose an effective efficient shared adversarial training network (SATN) in this paper, which can learn environment-independent perspective-free semantic with prior knowledge standard stencil-rendered plates, as...

10.1109/access.2019.2961744 article EN cc-by IEEE Access 2019-12-23

Visual information extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into several independent sub-tasks of text spotting (text detection recognition) extraction, which completely ignored the high correlation among them during optimization. In paper, we propose a robust visual system (VIES) towards real-world scenarios,...

10.48550/arxiv.2102.06732 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results plain texts and then utilized token-level category annotations as supervision to train a sequence tagging model. However, it expends great annotation costs may be exposed label confusion, the OCR errors will also significantly affect final performance. In this paper, we propose unified weakly-supervised learning...

10.24963/ijcai.2021/150 article EN 2021-08-01

Action recognition has been achieved great progress in recent years because of better feature representation learning and classification technology like convolutional neural networks (CNNs). However, most current deep approaches treat the action as a black box, ignoring specific domain knowledge itself. In this paper, by analyzing characteristics different actions, we proposed new framework that involves residual-attention module joint path-signature (JPSF) framework. The path signature...

10.1109/access.2019.2937344 article EN cc-by IEEE Access 2019-01-01

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently. Existing benchmarks have highlighted impressive performance LMMs in text recognition; however, their abilities on certain challenging tasks, such as localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently most...

10.48550/arxiv.2501.00321 preprint EN arXiv (Cornell University) 2024-12-31

The comprehension of text-rich visual scenes has become a focal point for evaluating Multi-modal Large Language Models (MLLMs) due to their widespread applications. Current benchmarks tailored the scenario emphasize perceptual capabilities, while overlooking assessment cognitive abilities. To address this limitation, we introduce Multimodal benchmark towards Text-rich scenes, evaluate Cognitive capabilities MLLMs through reasoning and content-creation tasks (MCTBench). mitigate potential...

10.48550/arxiv.2410.11538 preprint EN arXiv (Cornell University) 2024-10-15

The accurate classification of lung nodules is critical to achieving personalized cancer treatment and prognosis prediction. options for the patients are closely related type nodules, but there many types distinctions between certain subtle, making based on traditional medical imaging technology doctor experience challenging. This study adopts a novel approach, using computed tomography (CT) radiomics analyze quantitative features in CT images reveal characteristics then employs...

10.21037/qims-24-1315 article EN Quantitative Imaging in Medicine and Surgery 2024-12-01

Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e.g., invoices and purchase receipts). Most previous methods treat the VIE simply as a sequence labeling problem or classification problem, which requires models carefully identify each kind of semantics by introducing multimodal features, such font, color, layout. But features couldn't work well when faced with numeric semantic categories some ambiguous texts. To address this issue,...

10.48550/arxiv.2106.12940 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results into plain texts and then utilized token-level entity annotations as supervision to train a sequence tagging model. However, it expends great annotation costs may be exposed label confusion, the OCR errors will also significantly affect final performance. In this paper, we propose unified weakly-supervised learning...

10.48550/arxiv.2106.10681 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...