NFDI4DS | UHH-SEMS - Publication Details

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model

Spotting

DOI: 10.48550/arxiv.2403.10047 Publication Date: 2024-03-15

Abstract Supplemental Material References Cited by

AUTHORS (7)

Jiahao Lyu

Jin Wei

Gangyan Zeng

Zeng Li

Enze Xie

Wei Wang

Yu Zhou

ABSTRACT

Existing scene text spotters are designed to locate and transcribe texts from images. However, it is challenging for a spotter achieve precise detection recognition of simultaneously. Inspired by the glimpse-focus spotting pipeline human beings impressive performances Pre-trained Language Models (PLMs) on visual tasks, we ask: 1) "Can machines spot without just like beings?", if yes, 2) "Is block another alternative other than word or character?" To this end, our proposed leverages advanced PLMs enhance performance fine-grained detection. Specifically, first use simple detector block-level obtain rough positional information. Then, finetune PLM using large-scale OCR dataset accurate recognition. Benefiting comprehensive language knowledge gained during pre-training phase, PLM-based module effectively handles complex scenarios, including multi-line, reversed, occluded, incomplete-detection texts. Taking advantage fine-tuned model benchmarks paradigm detection, extensive experiments demonstrate superior across multiple public benchmarks. Additionally, attempt directly an entire image potential PLMs, even Large (LLMs).

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....