TeaForN: Teacher-Forcing with N-grams

FOS: Computer and information sciences Computer Science - Computation and Language Computation and Language (cs.CL) 01 natural sciences 0105 earth and related environmental sciences
DOI: 10.18653/v1/2020.emnlp-main.702 Publication Date: 2020-11-29T14:51:46Z
ABSTRACT
to be published in EMNLP 2020<br/>Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.<br/>
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (5)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....