Evaluating the effectiveness of prompt engineering for knowledge graph question answering

Benchmark (surveying) Baseline (sea)
DOI: 10.3389/frai.2024.1454258 Publication Date: 2025-01-13T06:12:17Z
ABSTRACT
Many different methods for prompting large language models have been developed since the emergence of OpenAI's ChatGPT in November 2022. In this work, we evaluate six few-shot methods. The first set experiments evaluates three frameworks that focus on quantity or type shots a prompt: baseline method with simple prompt and small number shots, random 10, 20, 30 similarity-based prompting. second target optimizing enhancing through Large Language Model (LLM)-generated explanations, using frameworks: Explain then Translate, Question Decomposition Meaning Representation, Optimization by Prompting. We these newly created Spider4SPARQL benchmark, as it is most complex SPARQL-based Knowledge Graph Answering (KGQA) benchmark to date. Across various used, commercial model unable achieve score over 51%, indicating KGQA, especially queries, multiple hops, operations filters remains challenging task LLMs. Our find successful framework KGQA combined an ontology five shots.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (17)
CITATIONS (0)