NFDI4DS | UHH-SEMS - Publication Details

Boosting Unsupervised Machine Translation with Pseudo-Parallel Data

Boosting Parallel corpora BLEU Baseline (sea)

DOI: 10.48550/arxiv.2310.14262 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (2)

Ivana Kvapilíková

Ondřej Bojar

ABSTRACT

Even with the latest developments in deep learning and large-scale language modeling, task of machine translation (MT) low-resource languages remains a challenge. Neural MT systems can be trained an unsupervised way without any resources but quality lags behind, especially truly conditions. We propose training strategy that relies on pseudo-parallel sentence pairs mined from monolingual corpora addition to synthetic back-translated corpora. experiment different schedules reach improvement up 14.5 BLEU points (English Ukrainian) over baseline data only.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Boosting Unsupervised Machine Translation with Pseudo-Parallel Data

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....