VLSP 2022 Abmusu Task Dataset: A Resource for Vietnamese Abstractive Multi-Document Summarization
Vietnamese
Benchmark (surveying)
Multi-document summarization
DOI:
10.1142/s2717554523500030
Publication Date:
2023-05-17T07:18:47Z
AUTHORS (4)
ABSTRACT
The performance of automatic summarization systems has improved significantly with the development supervised approaches. However, in Vietnamese abstractive multi-document task, available datasets are insufficient for training model. With this motivation, we contribute a new gold standard dataset, named Abmusu. Following collecting and clustering articles, have built hierarchical annotation process to generate summaries, three roles: annotator, supervisor, curator. As result, dataset contains 600 news clusters formed from 1839 articles corresponding human-generated summaries. To best our knowledge, Abmusu is biggest that freely research. Moreover, summaries more concise, making it challenging train models. We also used various baselines benchmark dataset.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (22)
CITATIONS (1)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....