VLSP 2022 Abmusu Task Dataset: A Resource for Vietnamese Abstractive Multi-Document Summarization

Vietnamese Benchmark (surveying) Multi-document summarization
DOI: 10.1142/s2717554523500030 Publication Date: 2023-05-17T07:18:47Z
ABSTRACT
The performance of automatic summarization systems has improved significantly with the development supervised approaches. However, in Vietnamese abstractive multi-document task, available datasets are insufficient for training model. With this motivation, we contribute a new gold standard dataset, named Abmusu. Following collecting and clustering articles, have built hierarchical annotation process to generate summaries, three roles: annotator, supervisor, curator. As result, dataset contains 600 news clusters formed from 1839 articles corresponding human-generated summaries. To best our knowledge, Abmusu is biggest that freely research. Moreover, summaries more concise, making it challenging train models. We also used various baselines benchmark dataset.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (22)
CITATIONS (1)