NFDI4DS | UHH-SEMS - Publication Details

Delving into ChatGPT usage in academic writing through excess vocabulary

Social and Information Networks (cs.SI) FOS: Computer and information sciences Computer Science - Computers and Society Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computers and Society (cs.CY) Computer Science - Digital Libraries Computer Science - Social and Information Networks Digital Libraries (cs.DL) Computation and Language (cs.CL)

DOI: 10.48550/arxiv.2406.07016 Publication Date: 2024-06-11

Abstract Supplemental Material References Cited by

AUTHORS (4)

Dmitry Kobak

Rita González Már...

Emőke-Ágnes Horvát

Jan Lause

ABSTRACT

Recent large language models (LLMs) can generate and revise text with human-level performance, have been widely commercialized in systems like ChatGPT. These come clear limitations: they produce inaccurate information, reinforce existing biases, be easily misused. Yet, many scientists using them to assist their scholarly writing. How wide-spread is LLM usage the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on usage. We study vocabulary changes 14 million PubMed abstracts 2010-2024, show how appearance of LLMs led abrupt increase frequency certain style words. Our analysis based excess words suggests that at least 10% 2024 were processed LLMs. This lower bound differed across disciplines, countries, journals, was as high 30% for some sub-corpora. LLM-based writing assistants has had unprecedented impact scientific literature, surpassing effect major world events such Covid pandemic.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Delving into ChatGPT usage in academic writing through excess vocabulary

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....