๐ŸŒ Spatial-SSRL: Spatial Reasoning with Vision-Language Models

Understanding 3D Spatial Relationships from 2D Images

โœจ Upload an image and ask questions about spatial relationships, locations, and orientations! โœจ

๐Ÿ“– Paper | ๐Ÿ  Github | ๐Ÿค— Spatial-SSRL-7B Model | ๐Ÿค— Spatial-SSRL-81k | ๐Ÿ“ฐ Daily Paper

When enabled, the predefined format prompt is automatically concatenated to your question.

๐Ÿ“ธ Example Questions

Click on an example below to load it:

Complete Examples
Input Image Question Apply format prompt (default on)

About

This demo showcases spatial reasoning capabilities of vision-language models. The model can:

  • Understand 3D spatial relationships from 2D images
  • Reason about object locations (near/far, front/behind)
  • Analyze object orientations and facing directions
  • Provide step-by-step reasoning before answering

Citation

If you find this project useful, please kindly cite: