Learning an Unreferenced Metric for Online Dialogue Evaluation
Open domain
DOI:
10.18653/v1/2020.acl-main.220
Publication Date:
2020-07-29T14:14:43Z
AUTHORS (6)
ABSTRACT
Evaluating the quality of a dialogue interaction between two agents is difficult task, especially in open-domain chit-chat style dialogue. There have been recent efforts to develop automatic evaluation metrics, but most them do not generalize unseen datasets and/or need human-generated reference response during inference, making it infeasible for online evaluation. Here, we propose an unreferenced automated metric that uses large pre-trained language models extract latent representations utterances, and leverages temporal transitions exist them. We show our model achieves higher correlation with human annotations setting, while requiring true responses comparison inference.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (13)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....