
At the heart of spatial intelligence is a simple but powerful question: Where is an object, and how is it oriented in space?
Imagine holding a box in your hand and shining a flashlight on it. On the wall appears a flat shadow — a 3D shape reduced to 2D. This is the pinhole principle, one of the oldest ideas in physics, and it still drives modern computer vision.
When a LiDAR or camera scans an object, it captures a 3D point cloud — millions of dots describing the object’s geometry. But machines, like our eyes, must interpret these dots in 2D. The bridge between these two worlds is the projection matrix.
In the language of projective geometry, the projection matrix explains why parallel lines appear to meet at a vanishing point and why objects shrink as they move farther away. This is where projective geometry departs from Euclidean geometry, which follows Euclid’s 5th axiom: parallel lines never meet. Projection geometry, in a sense, lets us see infinity by compressing it into the 2D image plane.
For true spatial intelligence, though, projection alone isn’t enough. To understand an object’s actual position (where it is relative to us) and orientation (how it’s rotated), we must bring back Euclidean geometry — the rules of distance, angle, and rotation. By combining the two, we move from distorted 2D projections to faithful 3D understanding.
This process — aligning what the sensor predicts in 2D with what it actually sees — is what enables a self-driving car to track vehicles on the road, or a robot to pick up an object at exactly the right angle.
It’s a reminder that spatial intelligence is not just about seeing — it’s about translating perspective into understanding, connecting ancient principles of geometry with the most advanced AI systems today.
👉 Curious to hear: how are you or your team tackling the challenges of connecting 3D and 2D representations?