NFDI4DS | UHH-SEMS - Publication Details

Comparison of Feature Learning Methods for Metadata Extraction from PDF Scholarly Documents

Feature (linguistics)

DOI: 10.48550/arxiv.2501.05082 Publication Date: 2025-01-09

Abstract Supplemental Material References Cited by

AUTHORS (2)

ABSTRACT

The availability of metadata for scientific documents is pivotal in propelling knowledge forward and adhering to the FAIR principles (i.e. Findability, Accessibility, Interoperability, Reusability) research findings. However, lack sufficient published documents, particularly those from smaller mid-sized publishers, hinders their accessibility. This issue widespread some disciplines, such as German Social Sciences, where publications often employ diverse templates. To address this challenge, our study evaluates various feature learning prediction methods, including natural language processing (NLP), computer vision (CV), multimodal approaches, extracting with high template variance. We aim improve accessibility facilitate wider use. support comparison these we provide comprehensive experimental results, analyzing accuracy efficiency metadata. Additionally, valuable insights into strengths weaknesses which can guide future field.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

Comparison of Feature Learning Methods for Metadata Extraction from PDF Scholarly Documents

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....