Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation
Zero (linguistics)
Representation
DOI:
10.48550/arxiv.2406.08092
Publication Date:
2024-06-12
AUTHORS (3)
ABSTRACT
Understanding representation transfer in multilingual neural machine translation can reveal the representational issue causing zero-shot deficiency. In this work, we introduce identity pair, a sentence translated into itself, to address lack of base measure investigations, as pair represents optimal state among any language transfers. our analysis, demonstrate that encoder transfers source subspace target instead language-agnostic state. Thus, deficiency arises because representations are entangled with other languages and not transferred effectively language. Based on findings, propose two methods: 1) low-rank language-specific embedding at encoder, 2) contrastive learning decoder. The experimental results Europarl-15, TED-19, OPUS-100 datasets show methods substantially enhance performance translations by improving capacity, thereby providing practical evidence support conclusions.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....