ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations

FOS: Computer and information sciences Computer Science - Computation and Language Computation and Language (cs.CL)
DOI: 10.48550/arxiv.2304.14827 Publication Date: 2023-01-01
ABSTRACT
This paper aims to quantitatively evaluate the performance of ChatGPT, an interactive large language model, on inter-sentential relations such as temporal relations, causal and discourse relations. Given ChatGPT's promising across various tasks, we proceed carry out thorough evaluations whole test sets 11 datasets, including PDTB2.0-based, dialogue-based To ensure reliability our findings, employ three tailored prompt templates for each task, zero-shot template, engineering (PE) in-context learning (ICL) establish initial baseline scores all popular sentence-pair relation classification tasks first time. Through study, discover that ChatGPT exhibits exceptional proficiency in detecting reasoning about albeit it may not possess same level expertise identifying order between two events. While is capable majority with existing explicit connectives, implicit remains a formidable challenge. Concurrently, demonstrates subpar dialogue parsing task requires structural understanding before being aware relation.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....