NFDI4DS | UHH-SEMS - Publication Details

Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

Formalism (music)

DOI: 10.48550/arxiv.2309.15129 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (8)

Ida Momennejad

Hosein Hasanbeig

Felipe do Nascime...

Hiteshi Sharma

Robert Osazuwa Ness

Nebojša Jojić

Hamid Palangi

Jonathan Larson

ABSTRACT

Recently an influx of studies claim emergent cognitive abilities in large language models (LLMs). Yet, most rely on anecdotes, overlook contamination training sets, or lack systematic Evaluation involving multiple tasks, control conditions, iterations, and statistical robustness tests. Here we make two major contributions. First, propose CogEval, a science-inspired protocol for the evaluation capacities Large Language Models. The CogEval can be followed various abilities. Second, here follow to systematically evaluate maps planning ability across eight LLMs (OpenAI GPT-4, GPT-3.5-turbo-175B, davinci-003-175B, Google Bard, Cohere-xlarge-52.4B, Anthropic Claude-1-52B, LLaMA-13B, Alpaca-7B). We base our task prompts human experiments, which offer both established construct validity evaluating planning, are absent from LLM sets. find that, while show apparent competence few tasks with simpler structures, reveals striking failure modes including hallucinations invalid trajectories getting trapped loops. These findings do not support idea out-of-the-box LLMs. This could because understand latent relational structures underlying problems, known as maps, fail at unrolling goal-directed based structure. Implications application future directions discussed.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....