MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding

AKA Document layout analysis Code (set theory) Scalable Vector Graphics Plain text
DOI: 10.18653/v1/2022.acl-long.420 Publication Date: 2022-06-03T01:34:53Z
ABSTRACT
Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images. While, there are still a large number of digital where layout information is not fixed needs to be interactively dynamically rendered visualization, making existing layout-based approaches easy apply. In this paper, we propose MarkupLM understanding tasks markup languages backbone, HTML/XML-based documents, text jointly pre-trained. Experiment results show that pre-trained significantly outperforms strong baseline models on several tasks. The model code will publicly available at https://aka.ms/markuplm.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (13)