CHIA: CHoosing Instances to Annotate for Machine Translation
Training set
Parallel corpora
DOI:
10.18653/v1/2022.findings-emnlp.540
Publication Date:
2023-08-04T20:21:02Z
AUTHORS (3)
ABSTRACT
Neural machine translation (MT) systems have been shown to perform poorly on low-resource language pairs, for which large-scale parallel data is unavailable. Making the annotation process faster and cheaper therefore important ensure equitable access MT systems. To make optimal use of a limited budget, we present CHIA (choosing instances annotate), method selecting annotate translation. Using an existing multi-way dataset high-resource languages, first identify instances, based model training dynamics, that are most informative models languages. We find there cross-lingual commonalities in useful training, will be train new target language. Evaluating 20 languages from two corpora, show selected using our provides average performance improvement 1.59 BLEU over randomly same size.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....