- Handwritten Text Recognition Techniques
- Hydrocarbon exploration and reservoir analysis
- Advanced Image and Video Retrieval Techniques
- Vehicle License Plate Recognition
- Methane Hydrates and Related Phenomena
- Image Retrieval and Classification Techniques
- Natural Language Processing Techniques
- Image Processing and 3D Reconstruction
- Coal Properties and Utilization
- Atmospheric and Environmental Gas Dynamics
- Video Surveillance and Tracking Methods
- Generative Adversarial Networks and Image Synthesis
- Digital Media Forensic Detection
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Music and Audio Processing
- Hand Gesture Recognition Systems
- Face recognition and analysis
- Geological Studies and Exploration
- Advanced Image Processing Techniques
- Geology and Paleoclimatology Research
- Face and Expression Recognition
- Cancer Research and Treatments
- Domain Adaptation and Few-Shot Learning
- Plant Ecology and Soil Science
Chinese Academy of Sciences
2008-2022
Northwest Institute of Eco-Environment and Resources
2021-2022
Huazhong University of Science and Technology
2014-2021
Microsoft (United States)
2019-2021
Institute of Geology and Geophysics
2005-2017
Lanzhou University
2007
China Coal Research Institute (China)
2007
Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among most important and challenging tasks image-based recognition. A novel neural network architecture, integrates feature extraction, modeling transcription into unified framework, proposed. Compared with previous systems for proposed architecture possesses four distinctive properties: (1) It end-to-end trainable,...
Aerial scene classification, which aims to automatically label an aerial image with a specific semantic category, is fundamental problem for understanding high-resolution remote sensing imagery. In recent years, it has become active task in the area, and numerous algorithms have been proposed this task, including many machine learning data-driven approaches. However, existing data sets such as UC-Merced set WHU-RS19, contain relatively small sizes, results on them are already saturated. This...
A challenging aspect of scene text recognition is to handle with distortions or irregular layout. In particular, perspective and curved are common in natural scenes difficult recognize. this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network. The adaptively transforms input image into new one, rectifying the it. It powered by flexible Thin-Plate Spline transformation which handles variety irregularities trained without human annotations....
This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects with both high accuracy and efficiency in a single network forward pass, involving no post-process except for standard non-maximum suppression. TextBoxes outperforms competing methods terms of localization is much faster, taking only 0.09s per image implementation. Furthermore, combined recognizer, significantly state-of-the-art approaches on word spotting recognition tasks.
Most state-of-the-art text detection methods are specific to horizontal Latin and not fast enough for real-time applications. We introduce Segment Linking (SegLink), an oriented method. The main idea is decompose into two locally detectable elements, namely segments links. A segment box covering a part of word or line, link connects adjacent segments, indicating that they belong the same line. Both elements detected densely at multiple scales by end-to-end trained, fully-convolutional neural...
Recognizing text in natural images is a challenging task with many unsolved problems. Different from those documents, words often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust recognizer Automatic REctification), recognition model that robust to text. speciallydesigned deep neural network, consists of Spatial Transformer Network (STN) and Sequence Recognition (SRN). In testing, an image firstly rectified via...
Text in natural images is of arbitrary orientations, requiring detection terms oriented bounding boxes. Normally, a multi-oriented text detector often involves two key tasks: 1) presence detection, which classification problem disregarding orientation; 2) box regression, concerns about orientation. Previous methods rely on shared features for both tasks, resulting degraded performance due to the incompatibility tasks. To address this issue, we propose perform and regression different...
Scene text detection is an important step of scene recognition system and also a challenging problem. Different from general object detection, the main challenges lie on arbitrary orientations, small sizes, significantly variant aspect ratios in natural images. In this paper, we present end-to-end trainable fast detector, named TextBoxes++, which detects arbitrary-oriented with both high accuracy efficiency single network forward pass. No post-processing other than efficient non-maximum...
This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects with both high accuracy and efficiency in a single network forward pass, involving no post-process except for standard non-maximum suppression. TextBoxes outperforms competing methods terms of localization is much faster, taking only 0.09s per image implementation. Furthermore, combined recognizer, significantly state-of-the-art approaches on word spotting recognition tasks.
This letter introduces a robust representation of 3-D shapes, named DeepPano, learned with deep convolutional neural networks (CNN). Firstly, each shape is converted into panoramic view, namely cylinder projection around its principle axis. Then, variant CNN specifically designed for learning the representations directly from such views. Different typical CNN, row-wise max-pooling layer inserted between convolution and fully-connected layers, making invariant to rotation Our approach...
This paper proposes a new generative adversarial network to the problem of pose transfer, i.e., transferring given person target one. The generator comprises sequence Pose-Attentional Transfer Blocks that each transfers certain regions it attends to, generating image progressively. Compared with those in previous works, our generated images possess better appearance consistency and shape input images, thus significantly more realistic-looking. efficacy efficiency proposed are validated both...
Driven by the wide range of applications, scene text detection and recognition have become active research topics in computer vision. Though extensively studied, localizing reading uncontrolled environments remain extremely challenging, due to various interference factors. In this paper, we propose a novel multi-scale representation for recognition. This consists set detectable primitives, termed as strokelets, which capture essential substructures characters at different granularities....
Chinese is the most widely used language in world. Algorithms that read text natural images facilitate applications of various kinds. Despite large potential value, datasets and competitions past primarily focus on English, which bares very different characteristics than Chinese. This report introduces RCTW, a new competition focuses reading. The features large-scale dataset with over 12,000 annotated images. Two tasks, namely localization end-to-end recognition, are set up. took place from...
Chinese scene text reading is one of the most challenging problems in computer vision and has attracted great interest. Different from English text, more than 6000 commonly used characters can be arranged various layouts with numerous fonts. The signboards street view are a good choice for images since they have different backgrounds, fonts layouts. We organized competition called ICDAR2019-ReCTS, which mainly focuses on signboard. This report presents final results competition. A...
Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among most important and challenging tasks image-based recognition. A novel neural network architecture, integrates feature extraction, modeling transcription into unified framework, proposed. Compared with previous systems for proposed architecture possesses four distinctive properties: (1) It end-to-end trainable,...
This report presents the final results of ICDAR 2017 Robust Reading Challenge on COCO-Text. A challenge scene text detection and recognition based largest real dataset currently available: COCO-Text dataset. The competition is structured around three tasks: Text Localization, Cropped Word Recognition End-To-End Recognition. received a total 27 submissions over different opened tasks. describes datasets ground truth, details performance evaluation protocols used along with brief summary...
Despite the great success of convolutional neural networks (CNNs) for image classification task on data sets such as Cifar and ImageNet, CNN's representation power is still somewhat limited in dealing with images that have a large variation size clutter, where Fisher vector (FV) has shown to be an effective encoding strategy. FV encodes by aggregating local descriptors universal generative Gaussian mixture model (GMM). FV, however, learning capability its parameters are mostly fixed after...
With the rapid increase of transnational communication and cooperation, people frequently encounter multilingual scenarios in various situations. In this paper, we are concerned with a relatively new problem: script identification at word or line levels natural scenes. A large-scale dataset great quantity images 10 types widely-used languages is constructed released. allusion to challenges real-world scenarios, deep learning based algorithm proposed. The experiments on proposed demonstrate...