- Handwritten Text Recognition Techniques
- Natural Language Processing Techniques
- Video Analysis and Summarization
- Multimodal Machine Learning Applications
- Topic Modeling
- Web Data Mining and Analysis
- Digital Humanities and Scholarship
- Data Quality and Management
- Semantic Web and Ontologies
- Advanced Database Systems and Queries
- French Urban and Social Studies
- Diverse Cultural and Historical Studies
Henan Tianguan Group (China)
2024
Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO tags for tokens, following typical setting NLP. However, BIO-tagging scheme relies on correct order model inputs, not guaranteed real-world NER scanned VrDs where text are recognized and arranged by OCR systems. Such reading issue hinders accurate marking...
Recently developed pre-trained text-and-layout models (PTLMs) have shown remarkable success in multiple information extraction tasks on visually-rich documents. However, the prevailing evaluation pipeline may not be sufficiently robust for assessing ability of PTLMs, due to inadequate annotations within benchmarks. Therefore, we claim necessary standards an ideal benchmark evaluate PTLMs. We then introduce EC-FUNSD, entity-centric benckmark designed semantic entity recognition and linking...
Modeling and leveraging layout reading order in visually-rich documents (VrDs) is critical document intelligence as it captures the rich structure semantics within documents. Previous works typically formulated a permutation of elements, i.e. sequence containing all elements. However, we argue that this formulation does not adequately convey complete information layout, which may potentially lead to performance decline downstream VrD tasks. To address issue, propose model ordering relations...
The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role various real-world scenarios and applications. However, the research VrD-NER faces three major challenges: complex document layouts, incorrect reading orders, unsuitable task formulations. To address these challenges, we propose query-aware entity extraction head, namely UNER, to collaborate with existing multi-modal transformers develop more robust models. UNER head considers as combination sequence...
Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO tags for tokens, following typical setting NLP. However, BIO-tagging scheme relies on correct order model inputs, not guaranteed real-world NER scanned VrDs where text are recognized and arranged by OCR systems. Such reading issue hinders accurate marking...