Evaluating and enhancing spatial cognition abilities of large language models
Published in International Journal of Geographical Information Science, 2025
Recommended citation: Yang, A., Fu, C., Jia, Q., Dong, W., Ma, M., Chen, H., … Wu, H. (2025). Evaluating and enhancing spatial cognition abilities of large language models. International Journal of Geographical Information Science, 1–36. https://doi.org/10.1080/13658816.2025.2490701 https://doi.org/10.1080/13658816.2025.2490701
Abstract
Large Language Models (LLMs) demonstrate various capabilities previously considered unique to humans. However, current evidence is insufficient to determine whether LLMs have developed spatial cognition, a fundamental aspect of human cognition underpinning logical-mathematical reasoning and various other skills. Previous studies on this topic have primarily concentrated on small-scale perceptions, leaving the spatial cognition within the context of GIScience largely unexamined. We introduce a benchmark that evaluates spatial cognition abilities across seven categories to systematically assess how well LLMs process and generate three types of spatial knowledge: landmark, route, and survey knowledge. Furthermore, we propose a tool-augmented approach named Hybrid Mind, which integrates LLMs with deterministic GIS algorithms to enhance their performance in spatial cognitive tasks. The core idea involves the implementation of a mental map builder that generates a quantitative map based on segmented qualitative constraints, overcoming LLMs’ fallacies in synthesizing spatial information. Our experimental results revealed that although LLMs exhibited potential for spatial cognition, their performance was poor across most spatial cognitive tasks, particularly in constructing route and survey knowledge. The leading model, GPT-4-turbo, correctly answered fewer than one-fourth of the questions. In contrast, the Hybrid Mind approach significantly improved performance, correctly solving 70.48% of the questions.