Large Language Models’ Accuracy in Emulating Human Experts’ Evaluation of Public Sentiments about Heated Tobacco Products on Social Media: Evaluation Study

DOI: 10.2196/63631 Publication Date: 2025-03-04T22:02:03Z
ABSTRACT
Sentiment analysis of alternative tobacco products discussed on social media is crucial in control research. Large language models (LLMs) are artificial intelligence that were trained extensive text data to emulate the linguistic patterns humans. LLMs may hold potential streamline time-consuming and labor-intensive process human sentiment analysis. This study aimed examine accuracy replicating evaluation messages relevant heated (HTPs). GPT-3.5 GPT-4 Turbo (OpenAI) used classify 500 Facebook (Meta Platforms) Twitter (subsequently rebranded X) messages. Each set consisted 200 human-labeled anti-HTPs, pro-HTPs, 100 neutral The evaluated each message up 20 times generate multiple response instances reporting its classification decisions. majority labels from these responses assigned as a model's decision for message. models' decisions then compared with those evaluators. accurately replicated 61.2% 57% demonstrated higher accuracies overall, 81.7% 77% Turbo's 3 reached 99% achieved instances. was anti- pro-HTP Most misclassifications occurred when or incorrectly classified irrelevant by model, whereas showed improvements across all categories reduced misclassifications, especially categorized irrelevant. can be analyze about HTPs. Results suggest reach approximately 80% results experts, even small number labeling generated model. A risk using misrepresentation overall due differences categories. Although this issue could newer future efforts should explore mechanisms underlying discrepancies how address them systematically.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (46)
CITATIONS (0)