NFDI4DS | UHH-SEMS - Publication Details

Hybrid-SQuAD: Hybrid Scholarly Question Answering Dataset

DOI: 10.48550/arxiv.2412.02788 Publication Date: 2024-12-03

Abstract Supplemental Material References Cited by

AUTHORS (4)

Tilahun Abedissa ...

Debayan Baneerje

Yaregal Assabie

Ricardo Usbeck

ABSTRACT

Existing Scholarly Question Answering (QA) methods typically target homogeneous data sources, relying solely on either text or Knowledge Graphs (KGs). However, scholarly information often spans heterogeneous necessitating the development of QA systems that can integrate from multiple sources. To address this challenge, we introduce Hybrid-SQuAD (Hybrid Dataset), a novel large-scale dataset designed to facilitate answering questions incorporating both and KG facts. The consists 10.5K question-answer pairs generated by large language model, leveraging KGs - DBLP SemOpenAlex alongside corresponding Wikipedia. In addition, propose RAG-based baseline hybrid achieving an exact match score 69.65 test set.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

Hybrid-SQuAD: Hybrid Scholarly Question Answering Dataset

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....