QASnowball: An Iterative Bootstrapping Framework for High-Quality Question-Answering Data Generation
Bootstrapping (finance)
Data set
DOI:
10.48550/arxiv.2309.10326
Publication Date:
2023-01-01
AUTHORS (7)
ABSTRACT
Recent years have witnessed the success of question answering (QA), especially its potential to be a foundation paradigm for tackling diverse NLP tasks. However, obtaining sufficient data build an effective and stable QA system still remains open problem. For this problem, we introduce iterative bootstrapping framework augmentation (named QASnowball), which can iteratively generate large-scale high-quality based on seed set supervised examples. Specifically, QASnowball consists three modules, answer extractor extract core phrases in unlabeled documents as candidate answers, generator questions filter out data. Moreover, self-enhanced by reseeding fine-tune itself different iterations, leading continual improvements generation quality. We conduct experiments high-resource English scenario medium-resource Chinese scenario, experimental results show that generated facilitate models: (1) training models achieves comparable using data, (2) pre-training fine-tuning achieve better performance. Our code will released advance further work.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....