NFDI4DS | UHH-SEMS - Publication Details

Pre-training for Abstractive Document Summarization by Reinstating Source Text

Benchmark (surveying) Multi-document summarization Sequence (biology)

DOI: 10.18653/v1/2020.emnlp-main.297 Publication Date: 2020-11-29T14:51:46Z

Abstract Supplemental Material References Cited by

AUTHORS (5)

Yanyan Zou

Xingxing Zhang

Wei Lu

Furu Wei

Ming Zhou

ABSTRACT

ive document summarization is usually modeled as a sequence-to-sequence (SEQ2SEQ) learning problem. Unfortunately, training large SEQ2SEQ based models on limited supervised data challenging. This paper presents three pre-training (in shorthand, STEP) objectives which allow us to pre-train abstractive model unlabeled text. The main idea that, given an input text artificially constructed from document, pre-trained reinstate the original document. These include sentence reordering, next generation and masked generation, have close relations with task. Experiments two benchmark datasets (i.e., CNN/DailyMail New York Times) show that all can improve performance upon baselines. Compared large-scale (larger than 160GB), our method, only 19GB for pre-training, achieves comparable results, demonstrates its effectiveness.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (19)

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Pre-training for Abstractive Document Summarization by Reinstating Source Text

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....