ExplainCPE: A Free-text Explanation Benchmark of Chinese Pharmacist Examination

Interpretability Benchmark (surveying) Scarcity
DOI: 10.18653/v1/2023.findings-emnlp.129 Publication Date: 2023-12-10T21:58:19Z
ABSTRACT
In the field of Large Language Models (LLMs), researchers are increasingly exploring their effectiveness across a wide range tasks. However, critical area that requires further investigation is interpretability these models, particularly ability to generate rational explanations for decisions. Most existing explanation datasets limited English language and general domain, which leads scarcity linguistic diversity lack resources in specialized domains, such as medical. To mitigate this, we propose ExplainCPE, challenging medical dataset consisting over 7K problems from Chinese Pharmacist Examination, specifically tailored assess model-generated explanations. From overall results, only GPT-4 passes pharmacist examination with 75.7% accuracy, while other models like ChatGPT fail. Further detailed analysis LLM-generated reveals limitations LLMs understanding text executing computational reasoning. With increasing importance AI safety trustworthiness, ExplainCPE takes step towards improving evaluating domain.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (2)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....