Delving into ChatGPT usage in academic writing through excess vocabulary

Social and Information Networks (cs.SI) FOS: Computer and information sciences Computer Science - Computers and Society Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computers and Society (cs.CY) Computer Science - Digital Libraries Computer Science - Social and Information Networks Digital Libraries (cs.DL) Computation and Language (cs.CL)
DOI: 10.48550/arxiv.2406.07016 Publication Date: 2024-06-11
ABSTRACT
Recent large language models (LLMs) can generate and revise text with human-level performance, have been widely commercialized in systems like ChatGPT. These come clear limitations: they produce inaccurate information, reinforce existing biases, be easily misused. Yet, many scientists using them to assist their scholarly writing. How wide-spread is LLM usage the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on usage. We study vocabulary changes 14 million PubMed abstracts 2010-2024, show how appearance of LLMs led abrupt increase frequency certain style words. Our analysis based excess words suggests that at least 10% 2024 were processed LLMs. This lower bound differed across disciplines, countries, journals, was as high 30% for some sub-corpora. LLM-based writing assistants has had unprecedented impact scientific literature, surpassing effect major world events such Covid pandemic.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....