Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study
Dental research
DOI:
10.1371/journal.pone.0317423
Publication Date:
2025-01-29T18:25:52Z
AUTHORS (5)
ABSTRACT
This study aims to evaluate the performance of latest large language models (LLMs) in answering dental multiple choice questions (MCQs), including both text-based and image-based questions. A total 1490 MCQs from two board review books for United States National Board Dental Examination were selected. evaluated six LLMs as August 2024, ChatGPT 4.0 omni (OpenAI), Gemini Advanced 1.5 Pro (Google), Copilot with GPT-4 Turbo (Microsoft), Claude 3.5 Sonnet (Anthropic), Mistral Large 2 (Mistral AI), Llama 3.1 405b (Meta). χ2 tests performed determine whether there significant differences percentages correct answers among sample each discipline (p < 0.05). Significant observed percentage accurate across questions, (p<0.001). For sample, (85.5%), (84.0%), (83.8%) demonstrated highest accuracy, followed by (78.3%) (77.1%), (72.4%) exhibiting lowest. Newer versions demonstrate superior compared earlier versions. Copilot, Claude, achieved high accuracy on low capable handling limited clinicians students should prioritize most up-to-date when supporting their learning, clinical practice, research.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (27)
CITATIONS (3)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....