NFDI4DS | UHH-SEMS - Publication Details

Parsing Table Structures in the Wild

OPENALEX - Publications

Rujiao Long Wen Wang Nan Xue Feiyu Gao Zhibo Yang and 2 more

This paper tackles the problem of table structure parsing (TSP) from images in wild. In contrast to existing studies that mainly focus on well-aligned tabular with simple layouts scanned PDF documents, we aim establish a practical system for real-world scenarios where input are taken or severe deformation, bending occlusions. For designing such system, propose an approach named Cycle-CenterNet top CenterNet novel cycle-pairing module simultaneously detect and group cells into structured...

10.1109/iccv48922.2021.00098 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Revisiting Document Image Dewarping by Grid Regularization

OPENALEX - Publications

Xiangwei Jiang Rujiao Long Nan Xue Zhibo Yang Cong Yao and 1 more

This paper addresses the problem of document image dewarping, which aims at eliminating geometric distortion in images for digitization. Instead designing a better neural network to approximate optical flow fields between inputs and outputs, we pursue best readability by taking text lines boundaries into account from constrained optimization perspective. Specifically, our proposed method first learns boundary points pixels then follows most simple observation that both horizontal vertical...

10.1109/cvpr52688.2022.00450 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models

OPENALEX - Publications

Y. Wang Dehong Gao Bin Li Rujiao Long Yi Lei and 5 more

10.1109/icassp49660.2025.10889170 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

LORE: Logical Location Regression Network for Table Structure Recognition

OPENALEX - Publications

Hangdi Xing Feiyu Gao Rujiao Long Jiajun Bu Zheng Qi and 3 more

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate corresponding markup sequences from table images. However, they either count on additional heuristic rules recover structures, require a huge amount training data and time-consuming sequential decoders. In paper, we propose an alternative paradigm. We model TSR as...

10.1609/aaai.v37i3.25402 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

OPENALEX - Publications

Zhibo Yang Rujiao Long Pengfei Wang Sibo Song Humen Zhong and 3 more

Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to wide range of real-world applications. Previously, numerous works have proposed tackle this problem. However, benchmarks used assess these methods are relatively plain, i.e., scenarios with complexity not fully represented benchmarks. As first contribution work, we curate release a new dataset for VIE, which document images much more challenging that they taken...

10.1109/cvpr52729.2023.01474 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

LORE++: Logical location regression network for table structure recognition with pre-training

OPENALEX - Publications

Rujiao Long Hangdi Xing Zhibo Yang Qi Zheng Zhi Yu and 2 more

10.1016/j.patcog.2024.110816 article EN Pattern Recognition 2024-07-23

End-to-end semantic-aware object retrieval based on region-wise attention

OPENALEX - Publications

Xiu Li Kun Jin Rujiao Long

10.1016/j.neucom.2019.06.008 article EN Neurocomputing 2019-06-06

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

OPENALEX - Publications

Rujiao Long Hangdi Xing Zhibo Yang Qi Zheng Zhi Yu and 2 more

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes or learning to directly generate corresponding markup sequences from table images. However, existing approaches either count on additional heuristic rules recover structures, face challenges capturing long-range dependencies within tables, resulting increased complexity. In paper, we propose...

10.48550/arxiv.2401.01522 preprint EN other-oa arXiv (Cornell University) 2024-01-01

HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction

OPENALEX - Publications

Rujiao Long Pengfei Wang Zhibo Yang Cong Yao

End-to-end visual information extraction (VIE) aims at integrating the hierarchical subtasks of VIE, including text spotting, word grouping, and entity labeling, into a unified framework. Dealing with gaps among three plays pivotal role in designing an effective VIE model. OCR-dependent methods heavily rely on offline OCR engines inevitably suffer from errors, while OCR-free methods, particularly those employing black-box model, might produce outputs that lack interpretability or contain...

10.48550/arxiv.2411.01139 preprint EN arXiv (Cornell University) 2024-11-02

CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models

OPENALEX - Publications

Y. Wang Dehong Gao Bin Li Rujiao Long Yi Lei and 5 more

The impressive performance of Large Language Model (LLM) has prompted researchers to develop Multi-modal LLM (MLLM), which shown great potential for various multi-modal tasks. However, current MLLM often struggles effectively address fine-grained challenges. We argue that this limitation is closely linked the models' visual grounding capabilities. restricted spatial awareness and perceptual acuity encoders frequently lead interference from irrelevant background information in images, causing...

10.48550/arxiv.2412.16869 preprint EN arXiv (Cornell University) 2024-12-22

TANet: A Tiny Plankton Classification Network for Mobile Devices

OPENALEX - Publications

Xiu Li Rujiao Long Jiangpeng Yan Kun Jin Jihae Lee

This paper is devoted to a lightweight convolutional neural network based on the attention mechanism called tiny (TANet). The TANet consists of three main parts termed as reduction module, self-attention operation, and group convolution. module alleviates information loss caused by pooling operation. new parameter-free operation makes model focus learning important images. convolution achieves compression multibranch fusion. Using parts, proposed enables efficient plankton classification...

10.1155/2019/6536925 article EN Mobile Information Systems 2019-04-03

Accumulation Knowledge Distillation for Conditional GAN Compression

OPENALEX - Publications

Tingwei Gao Rujiao Long

This paper focuses on an efficient and high-performance compression method for conditional generative adversarial networks (cGANs) from the perspective of knowledge distillation. Previous cGANs approaches using distillation typically transfer in a one-to-one manner, where specific student generator layer only receives same depth stage teacher generator. Obviously, this approach fails to sufficiently explore valuable dark embedded intermediate layers. To address issue, novel based...

10.1109/iccvw60793.2023.00140 article EN 2023-10-02

LORE: Logical Location Regression Network for Table Structure Recognition

OPENALEX - Publications

Hangdi Xing Feiyu Gao Rujiao Long Jiajun Bu Zheng Qi and 3 more

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate corresponding markup sequences from table images. However, they either count on additional heuristic rules recover structures, require a huge amount training data and time-consuming sequential decoders. In paper, we propose an alternative paradigm. We model TSR as...

10.48550/arxiv.2303.03730 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

OPENALEX - Publications

Zhibo Yang Rujiao Long Pengfei Wang Sibo Song Humen Zhong and 3 more

Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to wide range of real-world applications. Previously, numerous works have proposed tackle this problem. However, benchmarks used assess these methods are relatively plain, i.e., scenarios with complexity not fully represented benchmarks. As first contribution work, we curate release a new dataset for VIE, which document images much more challenging that they taken...

10.48550/arxiv.2303.13095 preprint EN other-oa arXiv (Cornell University) 2023-01-01

DeepFuse neural networks

OPENALEX - Publications

Xiu Li Rujiao Long Kun Jin

Generally, pre-trained backbone convolutional models for image classification are directly used as default model other tasks including detection and segmentation, so to avoid training a new from scratch. However, segmentation frameworks need combining low-level high-level features boost performance. In this paper, we point out prove that simple fusion of is insufficient defined an operation named DeepFuse generic family building blocks feature fusion. This block can be plugged into any...

10.1145/3321408.3322634 article EN Proceedings of the ACM Turing Celebration Conference - China 2019-05-17

Parsing Table Structures in the Wild

OPENALEX - Publications

Rujiao Long Wen Wang Nan Xue Feiyu Gao Zhibo Yang and 2 more

This paper tackles the problem of table structure parsing (TSP) from images in wild. In contrast to existing studies that mainly focus on well-aligned tabular with simple layouts scanned PDF documents, we aim establish a practical system for real-world scenarios where input are taken or severe deformation, bending occlusions. For designing such system, propose an approach named Cycle-CenterNet top CenterNet novel cycle-pairing module simultaneously detect and group cells into structured...

10.48550/arxiv.2109.02199 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Revisiting Document Image Dewarping by Grid Regularization

OPENALEX - Publications

Xiangwei Jiang Rujiao Long Nan Xue Zhibo Yang Cong Yao and 1 more

This paper addresses the problem of document image dewarping, which aims at eliminating geometric distortion in images for digitization. Instead designing a better neural network to approximate optical flow fields between inputs and outputs, we pursue best readability by taking text lines boundaries into account from constrained optimization perspective. Specifically, our proposed method first learns boundary points pixels then follows most simple observation that both horizontal vertical...

10.48550/arxiv.2203.16850 preprint EN other-oa arXiv (Cornell University) 2022-01-01