DUMB: A Dutch Model Benchmark
Benchmark (surveying)
Baseline (sea)
Benchmarking
Training set
DOI:
10.18653/v1/2023.emnlp-main.447
Publication Date:
2023-12-10T21:58:19Z
AUTHORS (3)
ABSTRACT
We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of datasets for low-, medium- and high-resource tasks. total nine tasks four that were previously not available in Dutch. Instead relying on mean score across tasks, we propose Relative Error Reduction (RER), which compares DUMB performance language models to strong baseline can be referred future even when assessing different sets models. Through comparison 14 pre-trained (mono- multi-lingual, varying sizes), assess internal consistency as well factors likely enable high performance. Our results indicate current monolingual under-perform suggest training larger with other architectures pre-training objectives. At present, highest is achieved by DeBERTaV3 (large), XLM-R (large) mDeBERTaV3 (base). In addition highlighting best strategies models, will foster further research A public leaderboard at https://dumbench.nl.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....