- Biomedical Text Mining and Ontologies
- Data Quality and Management
- Topic Modeling
- Semantic Web and Ontologies
- Advanced Text Analysis Techniques
- Natural Language Processing Techniques
- Web Data Mining and Analysis
- Research Data Management Practices
- Algorithms and Data Compression
- Computational Physics and Python Applications
- Computational and Text Analysis Methods
- Scientific Computing and Data Management
- Time Series Analysis and Forecasting
GESIS - Leibniz-Institute for the Social Sciences
2016-2020
Leibniz Association
2016
University of Bonn
2016
This demo paper presents a generic toolchain to extract, segment and match literature references from full text PDF files in the project EXCITE. The aim of EXCITE is extracting matching citations social science publications making more citation data available researchers. Each single step pipeline open source tools used accomplish tasks are explained. public system which integrates all components under an user-friendly interface put forward illustrated. As final step, special component...
Scientific full text papers are usually stored in separate places than their underlying research datasets. Authors typically make references to datasets by mentioning them for example using titles and the year of publication. However, most cases explicit links that would provide readers with direct access referenced missing. Manually detecting is time consuming requires an expert domain paper. In order all have been published already, we suggest evaluate a semi-automatic approach finding...
Today, full-texts of scientific articles are often stored in different locations than the used datasets.Dataset registries aim at a closer integration by making datasets citable but authors typically refer to using inconsistent abbreviations and heterogeneous metadata (e.g.title, publication year).It is thus hard reproduce research results, access for further analysis, determine impact dataset.Manually detecting references time-consuming requires expert knowledge underlying domain.We propose...
Scientific full text papers are usually stored in separate places than their underlying research datasets. Authors typically make references to datasets by mentioning them for example using titles and the year of publication. However, most cases explicit links that would provide readers with direct access referenced missing. Manually detecting is time consuming requires an expert domain paper. In order all have been published already, we suggest evaluate a semi-automatic approach finding...
Citation matching is a challenging task due to different problems such as the variety of citation styles, mistakes in reference strings and quality identified segments. The classic configuration used this paper combination blocking technique binary classifier. Three possible inputs (reference strings, segments segments) were tested find most efficient strategy for matching. In classification step, we describe effect which probabilities can have Our evaluation on manually curated gold...
Today, full-texts of scientific articles are often stored in different locations than the used datasets. Dataset registries aim at a closer integration by making datasets citable but authors typically refer to using inconsistent abbreviations and heterogeneous metadata (e.g. title, publication year). It is thus hard reproduce research results, access for further analysis, determine impact dataset. Manually detecting references time-consuming requires expert knowledge underlying domain.We...
In this article, we describe highly cited publications in a PLOS ONE full-text corpus. For these publications, analyse the citation contexts concerning their position text and age at time of citing. By selecting perspective papers, can distinguish them based on context during even if do not have any other information source or metrics. We top references how, when which they are cited. The focus study is to explain nature reception papers. found that distinguishable by IMRaD sections...
A variety of schemas and ontologies are currently used for the machine-readable description bibliographic entities citations. This diversity, reuse same ontology terms with different nuances, generates inconsistencies in data. Adoption a single data model would facilitate integration tasks regardless supplier or context application. In this paper we present OpenCitations Data Model (OCDM), generic describing citations, developed using Semantic Web technologies. We also evaluate effective...