DUMB: A Dutch Model Benchmark

Benchmark (surveying) Baseline (sea) Benchmarking Training set
DOI: 10.18653/v1/2023.emnlp-main.447 Publication Date: 2023-12-10T21:58:19Z
ABSTRACT
We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of datasets for low-, medium- and high-resource tasks. total nine tasks four that were previously not available in Dutch. Instead relying on mean score across tasks, we propose Relative Error Reduction (RER), which compares DUMB performance language models to strong baseline can be referred future even when assessing different sets models. Through comparison 14 pre-trained (mono- multi-lingual, varying sizes), assess internal consistency as well factors likely enable high performance. Our results indicate current monolingual under-perform suggest training larger with other architectures pre-training objectives. At present, highest is achieved by DeBERTaV3 (large), XLM-R (large) mDeBERTaV3 (base). In addition highlighting best strategies models, will foster further research A public leaderboard at https://dumbench.nl.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....