NFDI4DS | UHH-SEMS - Publication Details

Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations

Benchmark (surveying) Popularity

DOI: 10.48550/arxiv.2303.18027 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Jungo Kasai

Yuhei Kasai

Keisuke Sakaguchi

Yutaro Yamada

Dragomir Radev

ABSTRACT

As large language models (LLMs) gain popularity among speakers of diverse languages, we believe that it is crucial to benchmark them better understand model behaviors, failures, and limitations in languages beyond English. In this work, evaluate LLM APIs (ChatGPT, GPT-3, GPT-4) on the Japanese national medical licensing examinations from past five years, including current year. Our team comprises native Japanese-speaking NLP researchers a practicing cardiologist based Japan. experiments show GPT-4 outperforms ChatGPT GPT-3 passes all six years exams, highlighting LLMs' potential typologically distant However, our evaluation also exposes critical APIs. First, LLMs sometimes select prohibited choices should be strictly avoided practice Japan, such as suggesting euthanasia. Further, analysis shows API costs are generally higher maximum context size smaller for because way non-Latin scripts currently tokenized pipeline. We release Igaku QA well outputs exam metadata. hope results will spur progress more applications LLMs. available at https://github.com/jungokasai/IgakuQA.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....