NFDI4DS | UHH-SEMS - Publication Details

BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation

Benchmark (surveying)

DOI: 10.48550/arxiv.2502.04314 Publication Date: 2025-02-06

Abstract Supplemental Material References Cited by

AUTHORS (17)

The Omnilingual M...

Pierre Andrews

Mikel Artetxe

Mariano Coria Meg...

Marta R. Costa‐jussà

Joe Chuang

David C. Dale

Cynthia Gao

Jean Maillard

Alex Mourachko

Christophe Ropers

Safiyyah Saleem

Eduardo Sánchez

Ioannis Tsiamas

Arina Turkatenko

Albert Ventayol-b...

S. J. C. Yates

ABSTRACT

This paper presents BOUQuET, a multicentric and multi-register/domain dataset benchmark, its broader collaborative extension initiative. is handcrafted in non-English languages first, each of these source being represented among the 23 commonly used by half world's population therefore having potential to serve as pivot that will enable more accurate translations. The specially designed avoid contamination be multicentric, so enforce representation multilingual language features. In addition, goes beyond sentence level, it organized paragraphs various lengths. Compared with related machine translation (MT) datasets, we show BOUQuET has domains while simplifying task for non-experts. Therefore, suitable open initiative call participation are launching extend multi-way parallel corpus any written language.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....