Yi Tu

ORCID: 0000-0002-2184-4443
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Handwritten Text Recognition Techniques
  • Natural Language Processing Techniques
  • Video Analysis and Summarization
  • Multimodal Machine Learning Applications
  • Topic Modeling
  • Web Data Mining and Analysis
  • Digital Humanities and Scholarship
  • Data Quality and Management
  • Semantic Web and Ontologies
  • Advanced Database Systems and Queries
  • French Urban and Social Studies
  • Diverse Cultural and Historical Studies

Henan Tianguan Group (China)
2024

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO tags for tokens, following typical setting NLP. However, BIO-tagging scheme relies on correct order model inputs, not guaranteed real-world NER scanned VrDs where text are recognized and arranged by OCR systems. Such reading issue hinders accurate marking...

10.18653/v1/2023.emnlp-main.846 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

10.18653/v1/2024.emnlp-main.540 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

Recently developed pre-trained text-and-layout models (PTLMs) have shown remarkable success in multiple information extraction tasks on visually-rich documents. However, the prevailing evaluation pipeline may not be sufficiently robust for assessing ability of PTLMs, due to inadequate annotations within benchmarks. Therefore, we claim necessary standards an ideal benchmark evaluate PTLMs. We then introduce EC-FUNSD, entity-centric benckmark designed semantic entity recognition and linking...

10.48550/arxiv.2402.02379 preprint EN arXiv (Cornell University) 2024-02-04

Modeling and leveraging layout reading order in visually-rich documents (VrDs) is critical document intelligence as it captures the rich structure semantics within documents. Previous works typically formulated a permutation of elements, i.e. sequence containing all elements. However, we argue that this formulation does not adequately convey complete information layout, which may potentially lead to performance decline downstream VrD tasks. To address issue, propose model ordering relations...

10.48550/arxiv.2409.19672 preprint EN arXiv (Cornell University) 2024-09-29

The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role various real-world scenarios and applications. However, the research VrD-NER faces three major challenges: complex document layouts, incorrect reading orders, unsuitable task formulations. To address these challenges, we propose query-aware entity extraction head, namely UNER, to collaborate with existing multi-modal transformers develop more robust models. UNER head considers as combination sequence...

10.48550/arxiv.2408.01038 preprint EN arXiv (Cornell University) 2024-08-02

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO tags for tokens, following typical setting NLP. However, BIO-tagging scheme relies on correct order model inputs, not guaranteed real-world NER scanned VrDs where text are recognized and arranged by OCR systems. Such reading issue hinders accurate marking...

10.48550/arxiv.2310.11016 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...