MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Computation and Language (cs.CL) Information Retrieval (cs.IR) Computer Science - Information Retrieval Machine Learning (cs.LG)
DOI: 10.18653/v1/2020.acl-main.330 Publication Date: 2020-07-29T10:14:43Z
ABSTRACT
Accepted as a long paper at ACL 2020<br/>Recently, large-scale datasets have vastly facilitated the development in nearly all domains of Natural Language Processing. However, there is currently no cross-task dataset in NLP, which hinders the development of multi-task learning. We propose MATINF, the first jointly labeled large-scale dataset for classification, question answering and summarization. MATINF contains 1.07 million question-answer pairs with human-labeled categories and user-generated question descriptions. Based on such rich information, MATINF is applicable for three major NLP tasks, including classification, question answering, and summarization. We benchmark existing methods and a novel multi-task baseline over MATINF to inspire further research. Our comprehensive comparison and experiments over MATINF and other datasets demonstrate the merits held by MATINF.<br/>
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (4)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....