NFDI4DS | UHH-SEMS - Publication Details

Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

OPENALEX - Publications

Chong Zhang Ya Guo Yi Tu Huan Chen Jinyang Tang and 3 more

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO tags for tokens, following typical setting NLP. However, BIO-tagging scheme relies on correct order model inputs, not guaranteed real-world NER scanned VrDs where text are recognized and arranged by OCR systems. Such reading issue hinders accurate marking...

10.18653/v1/2023.emnlp-main.846 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents

OPENALEX - Publications

Yi Tu Chong Zhang Ya Guo Huan Chen Jinyang Tang and 2 more

10.1145/3664647.3681473 article EN 2024-10-26

Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding

OPENALEX - Publications

Chong Zhang Yi Tu Yixi Zhao Chenshu Yuan Huan Chen and 6 more

10.18653/v1/2024.emnlp-main.540 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

Rethinking the Evaluation of Pre-trained Text-and-Layout Models from an Entity-Centric Perspective

OPENALEX - Publications

Chong Zhang Yixi Zhao Chenshu Yuan Yi Tu Ya Guo and 1 more

Recently developed pre-trained text-and-layout models (PTLMs) have shown remarkable success in multiple information extraction tasks on visually-rich documents. However, the prevailing evaluation pipeline may not be sufficiently robust for assessing ability of PTLMs, due to inadequate annotations within benchmarks. Therefore, we claim necessary standards an ideal benchmark evaluate PTLMs. We then introduce EC-FUNSD, entity-centric benckmark designed semantic entity recognition and linking...

10.48550/arxiv.2402.02379 preprint EN arXiv (Cornell University) 2024-02-04

Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding

OPENALEX - Publications

Chong Zhang Yi Tu Yixi Zhao Chenshu Yuan Huan Chen and 6 more

Modeling and leveraging layout reading order in visually-rich documents (VrDs) is critical document intelligence as it captures the rich structure semantics within documents. Previous works typically formulated a permutation of elements, i.e. sequence containing all elements. However, we argue that this formulation does not adequately convey complete information layout, which may potentially lead to performance decline downstream VrD tasks. To address issue, propose model ordering relations...

10.48550/arxiv.2409.19672 preprint EN arXiv (Cornell University) 2024-09-29

UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents

OPENALEX - Publications

Yi Tu Chong Zhang Ya Guo Huan Chen Jinyang Tang and 2 more

The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role various real-world scenarios and applications. However, the research VrD-NER faces three major challenges: complex document layouts, incorrect reading orders, unsuitable task formulations. To address these challenges, we propose query-aware entity extraction head, namely UNER, to collaborate with existing multi-modal transformers develop more robust models. UNER head considers as combination sequence...

10.48550/arxiv.2408.01038 preprint EN arXiv (Cornell University) 2024-08-02

Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

OPENALEX - Publications

Chong Zhang Ya Guo Yi Tu Huan Chen Jinyang Tang and 3 more

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO tags for tokens, following typical setting NLP. However, BIO-tagging scheme relies on correct order model inputs, not guaranteed real-world NER scanned VrDs where text are recognized and arranged by OCR systems. Such reading issue hinders accurate marking...

10.48550/arxiv.2310.11016 preprint EN other-oa arXiv (Cornell University) 2023-01-01