Deep Pre-Training Transformers for Scientific Paper Representation

Representation ENCODE Feature Learning Feature vector Natural language understanding
DOI: 10.3390/electronics13112123 Publication Date: 2024-05-30T07:45:08Z
ABSTRACT
In the age of scholarly big data, efficiently navigating and analyzing vast corpus scientific literature is a significant challenge. This paper introduces specialized pre-trained BERT-based language model, termed SPBERT, which enhances natural processing tasks specifically tailored to domain analysis. Our method employs novel neural network embedding technique that leverages textual components, such as keywords, titles, abstracts, full texts, represent papers in vector space. By integrating recent advancements text representation unsupervised feature aggregation, SPBERT offers sophisticated approach encode essential information implicitly, thereby enhancing classification retrieval tasks. We applied our several real-world academic datasets, demonstrating notable improvements over existing methods. The findings suggest not only provides more effective but also facilitates deeper understanding large-scale paving way for informed accurate
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (45)
CITATIONS (1)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....