An Actor-Critic Algorithm for Sequence Prediction
Leverage (statistics)
Ground truth
Sequence (biology)
Deep Neural Networks
DOI:
10.48550/arxiv.1607.07086
Publication Date:
2016-01-01
AUTHORS (8)
ABSTRACT
We present an approach to training neural networks generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood are limited by the discrepancy between their and testing modes, as models must tokens conditioned on previous guesses rather than ground-truth tokens. address this problem introducing a \textit{critic} network that is trained predict value of output token, given policy \textit{actor} network. This results in procedure much closer test phase, allows us directly optimize for task-specific score such BLEU. Crucially, since we leverage these techniques supervised setting traditional RL setting, condition critic output. show our method leads improved performance both synthetic task, German-English machine translation. Our analysis paves way be applied natural language generation tasks, translation, caption generation, dialogue modelling.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....