Measuring Taiwanese Mandarin Language Understanding

Mandarin Chinese
DOI: 10.48550/arxiv.2403.20180 Publication Date: 2024-03-29
ABSTRACT
The evaluation of large language models (LLMs) has drawn substantial attention in the field recently. This work focuses on evaluating LLMs a Chinese context, specifically, for Traditional which been largely underrepresented existing benchmarks. We present TMLU, holistic suit tailored assessing advanced knowledge and reasoning capability LLMs, under context Taiwanese Mandarin. TMLU consists an array 37 subjects across social science, STEM, humanities, Taiwan-specific content, others, ranging from middle school to professional levels. In addition, we curate chain-of-thought-like few-shot explanations each subject facilitate complex skills. To establish comprehensive baseline, conduct extensive experiments analysis 24 LLMs. results suggest that open-weight demonstrate inferior performance comparing multilingual proprietary ones, Mandarin lag behind Simplified-Chinese counterparts. findings indicate great headrooms improvement, emphasize goal foster development localized Taiwanese-Mandarin release benchmark scripts community promote future research.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....