Jointly Measuring Diversity and Quality in Text Generation Models

BLEU Natural Language Generation Feature (linguistics) Text generation
DOI: 10.48550/arxiv.1904.03971 Publication Date: 2019-01-01
ABSTRACT
Text generation is an important Natural Language Processing task with various applications. Although several metrics have already been introduced to evaluate the text methods, each of them has its own shortcomings. The most widely used such as BLEU only consider quality generated sentences and neglect their diversity. For example, repeatedly one high sentence would result in a score. On other hand, more recent metric diversity texts known Self-BLEU ignores texts. In this paper, we propose both simultaneously by approximating distance learned generative model real data distribution. purpose, first introduce that approximates using n-gram based measures. Then, feature-based measure which on highly deep trained large corpus called BERT introduced. Finally, for oracle training mode generator's density can also be calculated, use measures between corresponding explicit distributions. Eventually, popular models are evaluated existing proposed preferences determined.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....