Hybrid-SQuAD: Hybrid Scholarly Question Answering Dataset
DOI:
10.48550/arxiv.2412.02788
Publication Date:
2024-12-03
AUTHORS (4)
ABSTRACT
Existing Scholarly Question Answering (QA) methods typically target homogeneous data sources, relying solely on either text or Knowledge Graphs (KGs). However, scholarly information often spans heterogeneous necessitating the development of QA systems that can integrate from multiple sources. To address this challenge, we introduce Hybrid-SQuAD (Hybrid Dataset), a novel large-scale dataset designed to facilitate answering questions incorporating both and KG facts. The consists 10.5K question-answer pairs generated by large language model, leveraging KGs - DBLP SemOpenAlex alongside corresponding Wikipedia. In addition, propose RAG-based baseline hybrid achieving an exact match score 69.65 test set.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....