- Handwritten Text Recognition Techniques
- Natural Language Processing Techniques
- Image Processing and 3D Reconstruction
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Digital Media Forensic Detection
- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Text and Document Classification Technologies
- Mathematics, Computing, and Information Processing
- Vehicle License Plate Recognition
- Face recognition and analysis
- Music and Audio Processing
- Hand Gesture Recognition Systems
- Topic Modeling
- Speech Recognition and Synthesis
- Livestock and Poultry Management
- Image Processing Techniques and Applications
- Generative Adversarial Networks and Image Synthesis
- Genetic and phenotypic traits in livestock
- Plant Virus Research Studies
- Video Surveillance and Tracking Methods
- Advanced Steganography and Watermarking Techniques
- Advanced Image Processing Techniques
- Genetic diversity and population structure
South China University of Technology
2018-2025
Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou)
2025
Sun Yat-sen University
2019-2025
China Agricultural University
2011-2024
Ministry of Agriculture and Rural Affairs
2023
University of Minnesota
2013
Automatic font generation is an imitation task, which aims to create a library that mimics the style of reference images while preserving content from source images. Although existing methods have achieved satisfactory performance, they still struggle with complex characters and large variations. To address these issues, we propose FontDiffuser, diffusion-based image-to-image one-shot method, innovatively models task as noise-to-denoise paradigm. In our introduce Multi-scale Content...
Existing scene text spotting (i.e., end-to-end detection and recognition) methods rely on costly bounding box annotations (e.g., text-line, word-level, or character-level boxes). For the first time, we demonstrate that training models can be achieved with an extremely low-cost annotation of a single-point for each instance. We propose method tackles as sequence prediction task. Given image input, formulate desired recognition results discrete tokens use auto-regressive Transformer to predict...
This paper presents a method that can accurately detect heads especially small under the indoor scene. To achieve this, we propose novel method, Feature Refine Net (FRN), and cascaded multi-scale architecture. FRN exploits hierarchical features created by deep convolutional neural networks. The proposed channel weighting enables to make use of alternatively effectively. improve performance head detection, architecture which has two detectors. One called global detector is responsible for...
Online and offline handwritten Chinese text recognition (HTCR) has been studied for decades. Early methods adopted oversegmentation-based strategies but suffered from low speed, insufficient accuracy, high cost of character segmentation annotations. Recently, segmentation-free based on connectionist temporal classification (CTC) attention mechanism, have dominated the field HCTR. However, people actually read by character, especially ideograms such as Chinese. This raises question: are...
End-to-end scene text spotting has made significant progress due to its intrinsic synergy between detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated quadrangles, polygons a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us train high-performing text-spotting models single-point annotation. v2 reserves the advantage of auto-regressive Transformer with an Instance...
This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective. We begin by revisiting the six commonly used benchmarks in STR and observe trend of performance saturation, whereby only 2.91% benchmark images cannot be accurately recognized an ensemble 13 representative models. While these results are impressive suggest that could considered solved, however, we argue this is primarily due less challenging nature common benchmarks, thus concealing underlying issues...
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness text-related visual tasks remains relatively unexplored. In this paper, we conducted comprehensive evaluation of Multimodal Models, such as GPT4V Gemini, various including Text Recognition, Scene Text-Centric Visual Question Answering (VQA), Document-Oriented VQA, Key Information Extraction (KIE), Handwritten Mathematical Expression...
In recent years, end-to-end scene text spotting approaches are evolving to the Transformer-based framework. While previous studies have shown crucial importance of intrinsic synergy between detection and recognition, advances in methods usually adopt an implicit strategy with shared query, which can not fully realize potential these two interactive tasks. this paper, we argue that explicit considering distinct characteristics recognition significantly improve performance spotting. To end,...
The eukaryotic mRNA surveillance pathway, a pivotal guardian of fidelity, stands at the nexus diverse biological processes, including antiviral immunity. Despite recognized function splicing factors on fate, intricate interplay shaping pathway remains elusive. We illustrate that conserved factor U2 snRNP auxiliary large subunit B (U2AF65B) modulates complex, contributing to transcriptomic homeostasis in maize. functionality requires ZmU2AF65B-mediated normal upstream frameshift 3 ( ZmUPF3 )...
Recently, tampered text detection in document image has attracted increasingly attention due to its essential role on information security. However, detecting visually consistent photographed images is still a main challenge. In this paper, we propose novel framework capture more fine-grained clues complex scenarios for detection, termed as Document Tampering Detector (DTD), which consists of Frequency Perception Head (FPH) compensate the deficiencies caused by inconspicuous visual features,...
Abstract Apoptotic protease activating factor 1 (Apaf-1) was traditionally defined as a scaffold protein in mammalian cells for assembling caspase activation platform known the ‘apoptosome’ after its binding to cytochrome c . Although Apaf-1 structurally resembles animal NOD-like receptor (NLR) and plant resistance ( R ) proteins, whether it is directly involved innate immunity still largely unknown. Here, we found that Apaf-1-like molecules from lancelets, fruit flies, mice, humans have...
The development of Chinese civilization has produced a vast collection historical documents. Recognizing and analyzing these documents hold significant value for the research ancient culture. Recently, researchers have tried to utilize deep-learning techniques automate recognition analysis. However, existing document datasets, which are heavily relied upon by models, suffer from limited data scale, insufficient character category, lack book-level annotation. To fill this gap, we introduce...
Multimodal Large Language Models (MLLMs) are typically based on decoder-only or cross-attention architectures. While MLLMs outperform their counterparts, they require significantly higher computational resources due to extensive self-attention and FFN operations visual tokens. This raises the question: can we eliminate these expensive while maintaining performance? To this end, present a novel analysis framework investigate necessity of costly in MLLMs. Our introduces two key innovations:...
Historical documents encompass a wealth of cultural treasures but suffer from severe damages including character missing, paper damage, and ink erosion over time. However, existing document processing methods primarily focus on binarization, enhancement, etc., neglecting the repair these damages. To this end, we present new task, termed Document Repair (HDR), which aims to predict original appearance damaged historical documents. fill gap in field, propose large-scale dataset HDR28K...
Large amounts of labeled data are urgently required for the training robust text recognizers. However, collecting handwriting diverse styles, along with an immense lexicon, is considerably expensive. Although synthesis a promising way to relieve hunger, two key issues synthesis, namely, style representation and content embedding, remain unsolved. To this end, we propose novel method that can synthesize parameterized controllable S tyles arbitrary-Length O ut-of-vocabulary based on G...
Chinese indigenous chickens (CICs) constitute world-renowned genetic resources due to their excellent traits, including early puberty, good meat quality and strong resistance disease. Unfortunately, the introduction of a large number commercial in past two decades has had an adverse effect on CICs. Using chicken 60 K single nucleotide polymorphism chip, we assessed diversity population structure 1,187 chickens, representing eight breeds, hybrid ancestral populations additional red jungle...
DExD/H-box helicases play essential roles in RNA metabolism, and emerging data suggest that they have additional functions antiviral immunity across species. However, little is known about this evolutionarily conserved family responses lower Here, by isolation of poly(I:C)-binding proteins amphioxus, an extant basal chordate, we found DHX9, DHX15 DDX23 to be responsible for cytoplasmic dsRNA detection amphioxus. Since the not been characterized mammals, performed further poly(I:C) pull down...
Handwritten Chinese Text Recognition (HCTR) is a challenging problem due to its high complexity. Previous methods based on over-segmentation, hidden Markov model (HMM) or long short-term memory recurrent neural network (LSTM-RNN) have achieved great success in recognition results. However, all of them, including over-segmentation methods, are incompetent accurate segmentation single character. To solve this problem, we propose fast and fully convolutional for end-to-end handwritten text....
End-to-end scene text spotting, which aims to read the in natural images, has garnered significant attention recent years. However, state-of-the-art methods usually incorporate detection and recognition simply by sharing backbone, does not directly take advantage of feature interaction between two tasks. In this paper, we propose a new end-to-end spotting framework termed SwinTextSpotter v2, seeks find better synergy recognition. Specifically, enhance relationship tasks using novel...
Scene text removal (STR) aims at replacing strokes in natural scenes with visually coherent backgrounds. Recent STR approaches rely on iterative refinements or explicit masks, resulting high complexity and sensitivity to the accuracy of localization. Moreover, most existing methods adopt convolutional architectures while potential vision Transformers (ViTs) remains largely unexplored. In this paper, we propose a simple-yet-effective ViT-based eraser, dubbed ViTEraser. Following concise...