NFDI4DS | UHH-SEMS - Publication Details

Guozhi Tang

ORCID: 0000-0003-0859-5195

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5074989007

Research Areas

Handwritten Text Recognition Techniques
Advanced Image and Video Retrieval Techniques
Human Pose and Action Recognition
Multimodal Machine Learning Applications
Gait Recognition and Analysis
Image Retrieval and Classification Techniques
Radiomics and Machine Learning in Medical Imaging
Lung Cancer Diagnosis and Treatment
Anomaly Detection Techniques and Applications
Natural Language Processing Techniques
Advanced X-ray and CT Imaging
Video Analysis and Summarization
Speech and dialogue systems
Context-Aware Activity Recognition Systems
Advanced Neural Network Applications
Hand Gesture Recognition Systems
Image Processing and 3D Reconstruction
Topic Modeling
Vehicle License Plate Recognition
Image and Object Detection Techniques

Sichuan University of Science and Engineering
2024-2025

Yibin University
2024

South China University of Technology
2019-2021

Graph Convolutional Neural Network for Human Action Recognition: A Comprehensive Survey

OPENALEX - Publications

Tasweer Ahmad Lianwen Jin Xin Zhang Songxuan Lai Guozhi Tang and 1 more

Video-based human action recognition is one of the most important and challenging areas research in field computer vision. Human has found many pragmatic applications video surveillance, human-computer interaction, entertainment, autonomous driving, etc. Owing to recent development deep learning methods for recognition, performance significantly enhanced datasets. Deep techniques are mainly used recognizing actions images videos comprising Euclidean data. A extension these non-Euclidean data...

10.1109/tai.2021.3076974 article EN IEEE Transactions on Artificial Intelligence 2021-04-01

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

OPENALEX - Publications

Jia-Peng Wang Chongyu Liu Lianwen Jin Guozhi Tang Jiaxin Zhang and 4 more

Visual Information Extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into several independent sub-tasks of text spotting (text detection recognition) information extraction, which completely ignored the high correlation among them during optimization. In paper, we propose a robust System (VIES) towards real-world...

10.1609/aaai.v35i4.16378 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

OPENALEX - Publications

Guozhi Tang Lele Xie Lianwen Jin Jiapeng Wang Jingdong Chen and 4 more

Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e.g., invoices and purchase receipts). Most previous methods treat the VIE simply as a sequence labeling problem or classification problem, which requires models carefully identify each kind of semantics by introducing multimodal features, such font, color, layout. But features can't work well when faced with numeric semantic categories some ambiguous texts. To address this issue, in...

10.24963/ijcai.2021/144 article EN 2021-08-01

Action Recognition Using Attention-Joints Graph Convolutional Neural Networks

OPENALEX - Publications

Tasweer Ahmad Huiyun Mao Luojun Lin Guozhi Tang

Human skeleton contains significant information about actions, therefore, it is quite intuitive to incorporate skeletons in human action recognition. resembles a graph where body joints and bones mimic nodes edges. This resemblance of structure the main motivation apply convolutional neural network for Results show that discriminant contribution different not equal actions. Therefore, we propose use attention-joints correspond significantly contributing specific Features corresponding only...

10.1109/access.2019.2961770 article EN cc-by IEEE Access 2019-12-23

Skeleton-based action recognition using sparse spatio-temporal GCN with edge effective resistance

OPENALEX - Publications

Tasweer Ahmad Lianwen Jin Luojun Lin Guozhi Tang

10.1016/j.neucom.2020.10.096 article EN Neurocomputing 2020-11-10

Fusing radiomics and deep learning features for automated classification of multi‐type pulmonary nodule

OPENALEX - Publications

Lingyan Du Guozhi Tang Yue Che S.-F. Ling Xin Chen and 1 more

Abstract Background The accurate classification of lung nodules is critical to achieving personalized cancer treatment and prognosis prediction. options for the patients are closely related type nodules, but there many types distinctions between certain subtle, making based on traditional medical imaging technology doctor experience challenging. Purpose In this study, a novel method was used analyze quantitative features in CT images using radiomics reveal characteristics pulmonary then...

10.1002/mp.17901 article EN Medical Physics 2025-05-20

Robust License Plate Recognition With Shared Adversarial Training Network

OPENALEX - Publications

Sheng Zhang Guozhi Tang Yuliang Liu Huiyun Mao

Recently, deep learning has greatly promoted the performance of license plate recognition (LPR) by robust features from numerous labeled data. However, large variation wild plates across complicated environments and perspectives is still a huge challenge to LPR. To solve problem, we propose an effective efficient shared adversarial training network (SATN) in this paper, which can learn environment-independent perspective-free semantic with prior knowledge standard stencil-rendered plates, as...

10.1109/access.2019.2961744 article EN cc-by IEEE Access 2019-12-23

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

OPENALEX - Publications

Jia-Peng Wang Chongyu Liu Lianwen Jin Guozhi Tang Jiaxin Zhang and 4 more

Visual information extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into several independent sub-tasks of text spotting (text detection recognition) extraction, which completely ignored the high correlation among them during optimization. In paper, we propose a robust visual system (VIES) towards real-world scenarios,...

10.48550/arxiv.2102.06732 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction using Sequences

OPENALEX - Publications

Jiapeng Wang Tianwei Wang Guozhi Tang Lianwen Jin Weihong Ma and 2 more

Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results plain texts and then utilized token-level category annotations as supervision to train a sequence tagging model. However, it expends great annotation costs may be exposed label confusion, the OCR errors will also significantly affect final performance. In this paper, we propose unified weakly-supervised learning...

10.24963/ijcai.2021/150 article EN 2021-08-01

Human Action Recognition in Unconstrained Trimmed Videos Using Residual Attention Network and Joints Path Signature

OPENALEX - Publications

Tasweer Ahmad Lianwen Jin Jialuo Feng Guozhi Tang

Action recognition has been achieved great progress in recent years because of better feature representation learning and classification technology like convolutional neural networks (CNNs). However, most current deep approaches treat the action as a black box, ignoring specific domain knowledge itself. In this paper, by analyzing characteristics different actions, we proposed new framework that involves residual-attention module joint path-signature (JPSF) framework. The path signature...

10.1109/access.2019.2937344 article EN cc-by IEEE Access 2019-01-01

Fusing Radiomics and Deep Learning Features for Automated Classification of Multi-Type Pulmonary Nodule

OPENALEX - Publications

Guozhi Tang Lingyan Du Yue Che S.-F. Ling

10.2139/ssrn.5015758 preprint EN 2024-01-01

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

OPENALEX - Publications

Ling Fu Biao Yang Zhebin Kuang Jiajun Song Yuzhe Li and 19 more

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently. Existing benchmarks have highlighted impressive performance LMMs in text recognition; however, their abilities on certain challenging tasks, such as localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently most...

10.48550/arxiv.2501.00321 preprint EN arXiv (Cornell University) 2024-12-31

MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark

OPENALEX - Publications

Bin Shan Xiang Fei Wei Shi Anlan Wang Guozhi Tang and 4 more

The comprehension of text-rich visual scenes has become a focal point for evaluating Multi-modal Large Language Models (MLLMs) due to their widespread applications. Current benchmarks tailored the scenario emphasize perceptual capabilities, while overlooking assessment cognitive abilities. To address this limitation, we introduce Multimodal benchmark towards Text-rich scenes, evaluate Cognitive capabilities MLLMs through reasoning and content-creation tasks (MCTBench). mitigate potential...

10.48550/arxiv.2410.11538 preprint EN arXiv (Cornell University) 2024-10-15

Multi-type classification of lung nodules based on CT radiomics and ensemble learning for diversity weighting

OPENALEX - Publications

Guozhi Tang Lingyan Du S.-F. Ling Yue Che Xin Chen

The accurate classification of lung nodules is critical to achieving personalized cancer treatment and prognosis prediction. options for the patients are closely related type nodules, but there many types distinctions between certain subtle, making based on traditional medical imaging technology doctor experience challenging. This study adopts a novel approach, using computed tomography (CT) radiomics analyze quantitative features in CT images reveal characteristics then employs...

10.21037/qims-24-1315 article EN Quantitative Imaging in Medicine and Surgery 2024-12-01

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

OPENALEX - Publications

Guozhi Tang Lele Xie Lianwen Jin Jiapeng Wang Jingdong Chen and 4 more

Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e.g., invoices and purchase receipts). Most previous methods treat the VIE simply as a sequence labeling problem or classification problem, which requires models carefully identify each kind of semantics by introducing multimodal features, such font, color, layout. But features couldn't work well when faced with numeric semantic categories some ambiguous texts. To address this issue,...

10.48550/arxiv.2106.12940 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction using Sequences

OPENALEX - Publications

Jiapeng Wang Tianwei Wang Guozhi Tang Lianwen Jin Weihong Ma and 2 more

Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results into plain texts and then utilized token-level entity annotations as supervision to train a sequence tagging model. However, it expends great annotation costs may be exposed label confusion, the OCR errors will also significantly affect final performance. In this paper, we propose unified weakly-supervised learning...

10.48550/arxiv.2106.10681 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Coming Soon ...