Boosting Unsupervised Machine Translation with Pseudo-Parallel Data

Boosting Parallel corpora BLEU Baseline (sea)
DOI: 10.48550/arxiv.2310.14262 Publication Date: 2023-01-01
ABSTRACT
Even with the latest developments in deep learning and large-scale language modeling, task of machine translation (MT) low-resource languages remains a challenge. Neural MT systems can be trained an unsupervised way without any resources but quality lags behind, especially truly conditions. We propose training strategy that relies on pseudo-parallel sentence pairs mined from monolingual corpora addition to synthetic back-translated corpora. experiment different schedules reach improvement up 14.5 BLEU points (English Ukrainian) over baseline data only.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....