Skip to yearly menu bar Skip to main content


Poster

Visual Graph Arena: Evaluating AI's Visual Conceptualization

Zahra Babaiee · Peyman Kiasari · Daniela Rus · Radu Grosu

[ ]
Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Recent advancements in multimodal large language models have driven breakthroughs in visual question answering. Yet, a critical gap persists, `conceptualization'—the ability to recognize and reason about the same concept despite variations in visual form, a basic ability of human reasoning. To address this challenge, we introduce the Visual Graph Arena (VGA), a dataset featuring six graph-based tasks designed to evaluate and improve AI systems’ capacity for visual abstraction. VGA uses graphs rendered in diverse layouts (e.g., Kamada-Kawai vs. planar) to test reasoning independent of visual form. Experiments with state-of-the-art vision models (ViT, Swin Transformers, ConvNeXt) and multimodal LLMs (GPT-o1, Claude 3.5 Sonnet) reveal a striking divide: human participants achieved near-perfect accuracy (88–100\%) across tasks, while models totally failed on isomorphism detection and showed limited success in path/cycle tasks. We further identify behavioral anomalies suggesting pseudo-intelligent pattern matching rather than genuine conceptual understanding. These findings underscore fundamental limitations in current AI models for visual understanding. By isolating the challenge of representation-invariant reasoning, the Visual Graph Arena provides a framework to drive progress toward human-like conceptualization in AI visual models.

Live content is unavailable. Log in and register to view live content