- Handwritten Text Recognition Techniques
- Image Retrieval and Classification Techniques
- Multimodal Machine Learning Applications
- Currency Recognition and Detection
- Advanced Image and Video Retrieval Techniques
- Digital Media Forensic Detection
- Advanced Neural Network Applications
- Natural Language Processing Techniques
- Topic Modeling
- Water Quality Monitoring Technologies
- Coral and Marine Ecosystems Studies
- Advanced Vision and Imaging
- Generative Adversarial Networks and Image Synthesis
- COVID-19 diagnosis using AI
- Image and Signal Denoising Methods
- Remote-Sensing Image Classification
- Robotics and Sensor-Based Localization
- Image Processing and 3D Reconstruction
- Video Surveillance and Tracking Methods
- Image and Video Stabilization
- Speech and Audio Processing
Alibaba Group (China)
2021-2025
Alibaba Group (United States)
2023
University Town of Shenzhen
2019
Tsinghua University
2019
This paper tackles the problem of table structure parsing (TSP) from images in wild. In contrast to existing studies that mainly focus on well-aligned tabular with simple layouts scanned PDF documents, we aim establish a practical system for real-world scenarios where input are taken or severe deformation, bending occlusions. For designing such system, propose an approach named Cycle-CenterNet top CenterNet novel cycle-pairing module simultaneously detect and group cells into structured...
This paper addresses the problem of document image dewarping, which aims at eliminating geometric distortion in images for digitization. Instead designing a better neural network to approximate optical flow fields between inputs and outputs, we pursue best readability by taking text lines boundaries into account from constrained optimization perspective. Specifically, our proposed method first learns boundary points pixels then follows most simple observation that both horizontal vertical...
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate corresponding markup sequences from table images. However, they either count on additional heuristic rules recover structures, require a huge amount training data and time-consuming sequential decoders. In paper, we propose an alternative paradigm. We model TSR as...
Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to wide range of real-world applications. Previously, numerous works have proposed tackle this problem. However, benchmarks used assess these methods are relatively plain, i.e., scenarios with complexity not fully represented benchmarks. As first contribution work, we curate release a new dataset for VIE, which document images much more challenging that they taken...
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes or learning to directly generate corresponding markup sequences from table images. However, existing approaches either count on additional heuristic rules recover structures, face challenges capturing long-range dependencies within tables, resulting increased complexity. In paper, we propose...
End-to-end visual information extraction (VIE) aims at integrating the hierarchical subtasks of VIE, including text spotting, word grouping, and entity labeling, into a unified framework. Dealing with gaps among three plays pivotal role in designing an effective VIE model. OCR-dependent methods heavily rely on offline OCR engines inevitably suffer from errors, while OCR-free methods, particularly those employing black-box model, might produce outputs that lack interpretability or contain...
The impressive performance of Large Language Model (LLM) has prompted researchers to develop Multi-modal LLM (MLLM), which shown great potential for various multi-modal tasks. However, current MLLM often struggles effectively address fine-grained challenges. We argue that this limitation is closely linked the models' visual grounding capabilities. restricted spatial awareness and perceptual acuity encoders frequently lead interference from irrelevant background information in images, causing...
This paper is devoted to a lightweight convolutional neural network based on the attention mechanism called tiny (TANet). The TANet consists of three main parts termed as reduction module, self-attention operation, and group convolution. module alleviates information loss caused by pooling operation. new parameter-free operation makes model focus learning important images. convolution achieves compression multibranch fusion. Using parts, proposed enables efficient plankton classification...
This paper focuses on an efficient and high-performance compression method for conditional generative adversarial networks (cGANs) from the perspective of knowledge distillation. Previous cGANs approaches using distillation typically transfer in a one-to-one manner, where specific student generator layer only receives same depth stage teacher generator. Obviously, this approach fails to sufficiently explore valuable dark embedded intermediate layers. To address issue, novel based...
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate corresponding markup sequences from table images. However, they either count on additional heuristic rules recover structures, require a huge amount training data and time-consuming sequential decoders. In paper, we propose an alternative paradigm. We model TSR as...
Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to wide range of real-world applications. Previously, numerous works have proposed tackle this problem. However, benchmarks used assess these methods are relatively plain, i.e., scenarios with complexity not fully represented benchmarks. As first contribution work, we curate release a new dataset for VIE, which document images much more challenging that they taken...
Generally, pre-trained backbone convolutional models for image classification are directly used as default model other tasks including detection and segmentation, so to avoid training a new from scratch. However, segmentation frameworks need combining low-level high-level features boost performance. In this paper, we point out prove that simple fusion of is insufficient defined an operation named DeepFuse generic family building blocks feature fusion. This block can be plugged into any...
This paper tackles the problem of table structure parsing (TSP) from images in wild. In contrast to existing studies that mainly focus on well-aligned tabular with simple layouts scanned PDF documents, we aim establish a practical system for real-world scenarios where input are taken or severe deformation, bending occlusions. For designing such system, propose an approach named Cycle-CenterNet top CenterNet novel cycle-pairing module simultaneously detect and group cells into structured...
This paper addresses the problem of document image dewarping, which aims at eliminating geometric distortion in images for digitization. Instead designing a better neural network to approximate optical flow fields between inputs and outputs, we pursue best readability by taking text lines boundaries into account from constrained optimization perspective. Specifically, our proposed method first learns boundary points pixels then follows most simple observation that both horizontal vertical...