Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning

Ticket Task Analysis Policy learning
DOI: 10.18653/v1/p18-1203 Publication Date: 2019-06-29T19:52:01Z
ABSTRACT
Training a task-completion dialogue agent via reinforcement learning (RL) is costly because it requires many interactions with real users. One common alternative to use user simulator. However, simulator usually lacks the language complexity of human interlocutors and biases in its design may tend degrade agent. To address these issues, we present Deep Dyna-Q, which our knowledge first deep RL framework that integrates planning for policy learning. We incorporate into model environment, referred as world model, mimic response generate simulated experience. During learning, constantly updated experience approach behavior, turn, optimized using both The effectiveness demonstrated on movie-ticket booking task human-in-the-loop settings.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (67)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....