Jack H. Culbert

ORCID: 0009-0000-1581-4021
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Research Data Management Practices
  • Topic Modeling
  • scientometrics and bibliometrics research
  • Diverse Approaches in Healthcare and Education Studies
  • Data Quality and Management
  • Biomedical Text Mining and Ontologies
  • Natural Language Processing Techniques
  • Text and Document Classification Technologies
  • Advanced Text Analysis Techniques
  • Social Media and Politics

In this research-in-progress paper, we apply a computational measure correlating with originality from creativity science: Divergent Semantic Integration (DSI), to selection of 99,557 scientific abstracts and titles selected the Web Science. We observe statistically significant differences in DSI between subject field research, slight rise over time. model base 10 logarithm citation count after 5 years find positive correlation all fields research an adjusted $R^2$ 0.13.

10.48550/arxiv.2502.01417 preprint EN arXiv (Cornell University) 2025-02-03

Abstract OpenAlex is a promising open source of scholarly metadata, and competitor to established proprietary sources, such as the Web Science Scopus. As provides its data freely openly, it permits researchers perform bibliometric studies that can be reproduced in community without licensing barriers. However, rapidly evolving contained within expanding also quickly changing, question naturally arises trustworthiness data. In this report, we will study reference coverage selected metadata...

10.1007/s11192-025-05293-3 article EN cc-by Scientometrics 2025-04-10

OpenAlex is a promising open source of scholarly metadata, and competitor to the established proprietary sources, Web Science Scopus. As provides its data freely openly, it permits researchers perform bibliometric studies that can be reproduced in community without licensing barriers. However, as rapidly evolving contained within expanding also quickly changing, question naturally arises trustworthiness data. In this empirical paper, we will study reference metadata coverage each database...

10.48550/arxiv.2401.16359 preprint EN arXiv (Cornell University) 2024-01-29

This study compares and analyses publication document types in the following bibliographic databases: OpenAlex, Scopus, Web of Science, Semantic Scholar PubMed. The results demonstrate that typologies can differ considerably between individual database providers. Moreover, distinction research non-research texts, which is required to identify relevant documents for bibliometric analysis, vary depending on data source because publications are classified differently respective databases. focus...

10.48550/arxiv.2406.15154 preprint EN arXiv (Cornell University) 2024-06-21

Objective: To explore and compare the performance of ChatGPT other state-of-the-art LLMs on domain-specific NER tasks covering different entity types domains in TCM against COVID-19 literature. Methods: We established a dataset 389 articles COVID-19, manually annotated 48 them with 6 entities belonging to 3 as ground truth, which can be assessed. then performed for using (GPT-3.5 GPT-4) 4 BERT-based question-answering (QA) models (RoBERTa, MiniLM, PubMedBERT SciBERT) without prior training...

10.48550/arxiv.2408.13501 preprint EN arXiv (Cornell University) 2024-08-24

4chan is a popular online imageboard which has been widely studied due to an observed concentration of far-right, antisemitic, racist, misogynistic, and otherwise hateful material being posted the site, as well emergence political movements evolution memes are there, discussed in Section 1.1. We have created tool developed Python utilises API collect data from selection boards. This paper accompanies release code via github repository: https://github.com/jhculb/4TCT. believe this will be use...

10.48550/arxiv.2307.03556 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

<sec> <title>BACKGROUND</title> Recent advances in large language models (LLMs) have shown remarkable performance on various downstream tasks zero- and few-shot scenarios, shedding light named entity recognition (NER) low-resource domains. Traditional Chinese medicine (TCM) against COVID-19 has been a new research topic led to niche literature. NER techniques are crucial for extracting utilizing the rich knowledge such </sec> <title>OBJECTIVE</title> To explore compare of ChatGPT other...

10.2196/preprints.54346 preprint EN 2023-11-07
Coming Soon ...