Comparison of Feature Learning Methods for Metadata Extraction from PDF Scholarly Documents

Feature (linguistics)
DOI: 10.48550/arxiv.2501.05082 Publication Date: 2025-01-09
ABSTRACT
The availability of metadata for scientific documents is pivotal in propelling knowledge forward and adhering to the FAIR principles (i.e. Findability, Accessibility, Interoperability, Reusability) research findings. However, lack sufficient published documents, particularly those from smaller mid-sized publishers, hinders their accessibility. This issue widespread some disciplines, such as German Social Sciences, where publications often employ diverse templates. To address this challenge, our study evaluates various feature learning prediction methods, including natural language processing (NLP), computer vision (CV), multimodal approaches, extracting with high template variance. We aim improve accessibility facilitate wider use. support comparison these we provide comprehensive experimental results, analyzing accuracy efficiency metadata. Additionally, valuable insights into strengths weaknesses which can guide future field.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....