Rujiao Long

ORCID: 0000-0003-1330-3193
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Handwritten Text Recognition Techniques
  • Image Retrieval and Classification Techniques
  • Multimodal Machine Learning Applications
  • Currency Recognition and Detection
  • Advanced Image and Video Retrieval Techniques
  • Digital Media Forensic Detection
  • Advanced Neural Network Applications
  • Natural Language Processing Techniques
  • Topic Modeling
  • Water Quality Monitoring Technologies
  • Coral and Marine Ecosystems Studies
  • Advanced Vision and Imaging
  • Generative Adversarial Networks and Image Synthesis
  • COVID-19 diagnosis using AI
  • Image and Signal Denoising Methods
  • Remote-Sensing Image Classification
  • Robotics and Sensor-Based Localization
  • Image Processing and 3D Reconstruction
  • Video Surveillance and Tracking Methods
  • Image and Video Stabilization
  • Speech and Audio Processing

Alibaba Group (China)
2021-2025

Alibaba Group (United States)
2023

University Town of Shenzhen
2019

Tsinghua University
2019

This paper tackles the problem of table structure parsing (TSP) from images in wild. In contrast to existing studies that mainly focus on well-aligned tabular with simple layouts scanned PDF documents, we aim establish a practical system for real-world scenarios where input are taken or severe deformation, bending occlusions. For designing such system, propose an approach named Cycle-CenterNet top CenterNet novel cycle-pairing module simultaneously detect and group cells into structured...

10.1109/iccv48922.2021.00098 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

This paper addresses the problem of document image dewarping, which aims at eliminating geometric distortion in images for digitization. Instead designing a better neural network to approximate optical flow fields between inputs and outputs, we pursue best readability by taking text lines boundaries into account from constrained optimization perspective. Specifically, our proposed method first learns boundary points pixels then follows most simple observation that both horizontal vertical...

10.1109/cvpr52688.2022.00450 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

10.1109/icassp49660.2025.10889170 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate corresponding markup sequences from table images. However, they either count on additional heuristic rules recover structures, require a huge amount training data and time-consuming sequential decoders. In paper, we propose an alternative paradigm. We model TSR as...

10.1609/aaai.v37i3.25402 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to wide range of real-world applications. Previously, numerous works have proposed tackle this problem. However, benchmarks used assess these methods are relatively plain, i.e., scenarios with complexity not fully represented benchmarks. As first contribution work, we curate release a new dataset for VIE, which document images much more challenging that they taken...

10.1109/cvpr52729.2023.01474 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes or learning to directly generate corresponding markup sequences from table images. However, existing approaches either count on additional heuristic rules recover structures, face challenges capturing long-range dependencies within tables, resulting increased complexity. In paper, we propose...

10.48550/arxiv.2401.01522 preprint EN other-oa arXiv (Cornell University) 2024-01-01

End-to-end visual information extraction (VIE) aims at integrating the hierarchical subtasks of VIE, including text spotting, word grouping, and entity labeling, into a unified framework. Dealing with gaps among three plays pivotal role in designing an effective VIE model. OCR-dependent methods heavily rely on offline OCR engines inevitably suffer from errors, while OCR-free methods, particularly those employing black-box model, might produce outputs that lack interpretability or contain...

10.48550/arxiv.2411.01139 preprint EN arXiv (Cornell University) 2024-11-02

The impressive performance of Large Language Model (LLM) has prompted researchers to develop Multi-modal LLM (MLLM), which shown great potential for various multi-modal tasks. However, current MLLM often struggles effectively address fine-grained challenges. We argue that this limitation is closely linked the models' visual grounding capabilities. restricted spatial awareness and perceptual acuity encoders frequently lead interference from irrelevant background information in images, causing...

10.48550/arxiv.2412.16869 preprint EN arXiv (Cornell University) 2024-12-22

This paper is devoted to a lightweight convolutional neural network based on the attention mechanism called tiny (TANet). The TANet consists of three main parts termed as reduction module, self-attention operation, and group convolution. module alleviates information loss caused by pooling operation. new parameter-free operation makes model focus learning important images. convolution achieves compression multibranch fusion. Using parts, proposed enables efficient plankton classification...

10.1155/2019/6536925 article EN Mobile Information Systems 2019-04-03

This paper focuses on an efficient and high-performance compression method for conditional generative adversarial networks (cGANs) from the perspective of knowledge distillation. Previous cGANs approaches using distillation typically transfer in a one-to-one manner, where specific student generator layer only receives same depth stage teacher generator. Obviously, this approach fails to sufficiently explore valuable dark embedded intermediate layers. To address issue, novel based...

10.1109/iccvw60793.2023.00140 article EN 2023-10-02

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate corresponding markup sequences from table images. However, they either count on additional heuristic rules recover structures, require a huge amount training data and time-consuming sequential decoders. In paper, we propose an alternative paradigm. We model TSR as...

10.48550/arxiv.2303.03730 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to wide range of real-world applications. Previously, numerous works have proposed tackle this problem. However, benchmarks used assess these methods are relatively plain, i.e., scenarios with complexity not fully represented benchmarks. As first contribution work, we curate release a new dataset for VIE, which document images much more challenging that they taken...

10.48550/arxiv.2303.13095 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Generally, pre-trained backbone convolutional models for image classification are directly used as default model other tasks including detection and segmentation, so to avoid training a new from scratch. However, segmentation frameworks need combining low-level high-level features boost performance. In this paper, we point out prove that simple fusion of is insufficient defined an operation named DeepFuse generic family building blocks feature fusion. This block can be plugged into any...

10.1145/3321408.3322634 article EN Proceedings of the ACM Turing Celebration Conference - China 2019-05-17

This paper tackles the problem of table structure parsing (TSP) from images in wild. In contrast to existing studies that mainly focus on well-aligned tabular with simple layouts scanned PDF documents, we aim establish a practical system for real-world scenarios where input are taken or severe deformation, bending occlusions. For designing such system, propose an approach named Cycle-CenterNet top CenterNet novel cycle-pairing module simultaneously detect and group cells into structured...

10.48550/arxiv.2109.02199 preprint EN cc-by arXiv (Cornell University) 2021-01-01

This paper addresses the problem of document image dewarping, which aims at eliminating geometric distortion in images for digitization. Instead designing a better neural network to approximate optical flow fields between inputs and outputs, we pursue best readability by taking text lines boundaries into account from constrained optimization perspective. Specifically, our proposed method first learns boundary points pixels then follows most simple observation that both horizontal vertical...

10.48550/arxiv.2203.16850 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...