BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation

Benchmark (surveying)
DOI: 10.48550/arxiv.2502.04314 Publication Date: 2025-02-06
ABSTRACT
This paper presents BOUQuET, a multicentric and multi-register/domain dataset benchmark, its broader collaborative extension initiative. is handcrafted in non-English languages first, each of these source being represented among the 23 commonly used by half world's population therefore having potential to serve as pivot that will enable more accurate translations. The specially designed avoid contamination be multicentric, so enforce representation multilingual language features. In addition, goes beyond sentence level, it organized paragraphs various lengths. Compared with related machine translation (MT) datasets, we show BOUQuET has domains while simplifying task for non-experts. Therefore, suitable open initiative call participation are launching extend multi-way parallel corpus any written language.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....