Evaluating Spatial Understanding of Large Language Models

Spatial Ability
DOI: 10.48550/arxiv.2310.14540 Publication Date: 2023-01-01
ABSTRACT
Large language models (LLMs) show remarkable capabilities across a variety of tasks. Despite the only seeing text in training, several recent studies suggest that LLM representations implicitly capture aspects underlying grounded concepts. Here, we explore particularly salient kind knowledge -- spatial relationships. We design natural-language navigation tasks and evaluate ability LLMs, particular GPT-3.5-turbo, GPT-4, Llama2 series models, to represent reason about structures. These reveal substantial variability performance different structures, including square, hexagonal, triangular grids, rings, trees. In extensive error analysis, find LLMs' mistakes reflect both non-spatial factors. findings LLMs appear certain structure implicitly, but room for improvement remains.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....