Delving into ChatGPT usage in academic writing through excess vocabulary
Social and Information Networks (cs.SI)
FOS: Computer and information sciences
Computer Science - Computers and Society
Computer Science - Computation and Language
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Computers and Society (cs.CY)
Computer Science - Digital Libraries
Computer Science - Social and Information Networks
Digital Libraries (cs.DL)
Computation and Language (cs.CL)
DOI:
10.48550/arxiv.2406.07016
Publication Date:
2024-06-11
AUTHORS (4)
ABSTRACT
Recent large language models (LLMs) can generate and revise text with human-level performance, have been widely commercialized in systems like ChatGPT. These come clear limitations: they produce inaccurate information, reinforce existing biases, be easily misused. Yet, many scientists using them to assist their scholarly writing. How wide-spread is LLM usage the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on usage. We study vocabulary changes 14 million PubMed abstracts 2010-2024, show how appearance of LLMs led abrupt increase frequency certain style words. Our analysis based excess words suggests that at least 10% 2024 were processed LLMs. This lower bound differed across disciplines, countries, journals, was as high 30% for some sub-corpora. LLM-based writing assistants has had unprecedented impact scientific literature, surpassing effect major world events such Covid pandemic.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....