ArcFlow
Company
Managed Services
Markets
  • News
  • LOG IN
  • GET STARTED

OZ brings Visual Intelligence to physical venues, a managed edge layer that lets real-world environments see, understand, and act in real time.

Talk to us

ArcFlow

  • World Models
  • Sensors

Managed Services

  • OZ VI Venue 1
  • Case Studies

Markets

  • Sports
  • Broadcasting
  • Robotics

Company

  • About
  • Technology
  • Careers
  • Contact

Ready to see it live?

Talk to the OZ team about deploying at your venues, from a single pilot match to a full regional rollout.

Schedule a deployment review

© 2026 OZ. All rights reserved.

LinkedIn
  1. Home
  2. Blog
  3. 12 Milliseconds vs 0.4: Why Spatial Computing Needs Its Own Engine

12 Milliseconds vs 0.4: Why Spatial Computing Needs Its Own Engine

Article January 25, 2026

Share this post

"If you can't query it in under a millisecond, the world model is too slow for the world." Suresh Gohane drops the number before the question is finished. He has the benchmarks memorized: 231 player pairs to check for proximity, 33,000 zone-occupancy evaluations per second, trajectory intersections projected across 600 position samples per entity, all of it computed within the time budget of a single video frame. On a traditional processor, that workload takes 12 milliseconds. On Arcflow's GPU pipeline, 0.4. That 30x gap is not an optimization. It is the difference between a spatial engine that keeps pace with the physical world and one that is always describing what already happened. Today he opens the hood on the design decisions that make sub-millisecond spatial queries possible, and why bolting spatial features onto a traditional database is, in his words, "like duct-taping a compass to a spreadsheet."

Every Database Claims Spatial Support#

Q: In your first interview, you explained why traditional databases fail at spatial workloads. Let's go deeper. What specifically breaks when you try to use an existing spatial database for real-time venue intelligence?

Let me be concrete. The most common approach today is to bolt spatial features onto a traditional database. It is a genuine engineering achievement. And it is fundamentally wrong for what we do.

Here is why. Traditional databases are built to think in alphabetical order; they organize and retrieve data by keys, categories, and labels. When you bolt spatial features on top, the database still thinks in alphabetical order underneath. It does not think in physical space.

What does that mean in practice? When you ask "find everyone within 5 meters of this position," a traditional database uses a spatial index to narrow the candidates, then checks each one individually. For a static map, this is fine. For 22 players updating 60 times per second with movement history, speed, direction, and zones that activate dynamically based on who is where, it's a disaster. The database doesn't understand that players near each other on the pitch should be stored near each other in memory. It doesn't understand that predicting path intersections benefits from knowing the order things happened. It treats spatial questions as afterthoughts bolted onto a system designed for something else entirely.

Bolting spatial features onto a traditional database is like duct-taping a compass to a spreadsheet. The compass works. The spreadsheet works. Together, they solve neither problem well.

Q: So Arcflow was built from first principles. What are those first principles?

Three. First: physical location is the primary way data is organized, not an afterthought. Second: spatial questions like "who is near whom?" are built into the engine's core, not bolted on as add-ons. Third: the system is designed for continuous real-time updates, not batch uploads.

Every design decision in Arcflow traces to one of these three principles.

Why GPU Acceleration Matters Here#

Q: You run spatial computation on GPUs. That sounds like it could be marketing. Make the case that it's engineering.

Fair challenge. Let me make it with numbers.

Think about the math. Checking the distance between every pair of 22 players means 231 pairs. Each player has a movement trail: 10 seconds of history at 60 updates per second means 600 position samples each. Checking whether two players' paths will cross means comparing those full movement trails. Do that across all 231 pairs.

Now add zones. A typical venue has 15 to 30 defined zones (penalty areas, midfield thirds, channel lanes). Checking which zone each player is in at each frame means 33,000 zone checks per second, just for zone occupancy.

Now combine them. A query like "find all players within passing distance of the ball carrier whose paths will intersect an open zone within 2 seconds" requires simultaneous proximity checks, path projection, and zone intersection, all computed within the time budget of a single frame.

This is exactly the kind of calculation GPUs were purpose-built for. Each distance check is independent. Each path intersection test is independent. Each zone check is independent. Run them all in parallel on the GPU. What takes 12 milliseconds on a traditional processor takes under 0.4 milliseconds on the GPU.

We run Arcflow's spatial calculations on the same edge GPUs that Bhagyashree's AI models use. In development, we run on Apple's GPU framework so every engineer on the team can develop and test spatial queries locally with full GPU acceleration. Same interface, same behavior, different hardware underneath.

GPU-accelerated spatial queries aren't about raw speed alone; they're about fitting spatial computation within the time budget of a real-time system. When AI perception, spatial queries, and downstream applications all share a single-frame time budget, every millisecond matters. The difference between 12 milliseconds on a traditional processor and 0.4 milliseconds on GPU is the difference between keeping up with live action and falling behind.

Arcflow: Built for the Workload#

Q: Walk me through the architecture. What does Arcflow look like internally?

Three layers. The spatial storage engine, the live query processor, and the spatial question evaluator.

The spatial storage engine organizes data around physical location. Each tracked entity (a player, the ball, a referee) isn't just stored as a current position. It carries its full movement trail: the last N position samples with timestamps, speed, and direction. This means "where are they now?" and "where have they been?" are answered from the same data structure. You don't need to stitch together separate tables. Position and movement history are one thing.

The live query processor handles continuous questions. In a traditional database, you ask a question and get a one-time answer. In Arcflow, you register a standing question ("tell me when anyone enters the penalty area") and the system evaluates that question against every spatial update, notifying you the instant it becomes true. These are not repeated checks. They are live spatial triggers.

The spatial question evaluator is where the GPU acceleration lives. It takes built-in spatial queries ("who is inside this zone?", "who is nearest to this point?", "does this path cross that boundary?", "will these two players' paths intersect?") and runs them on the GPU against the current spatial state. These are native operations, not afterthoughts. The engine understands them deeply and can optimize the order it evaluates them.

Q: What do those spatial queries look like in practice?

Four core questions cover roughly 90% of what our customers ask. "Is this person inside this zone?" "Who is closest to this point?" "Does this movement path cross this boundary?" "Will these two people's paths intersect within the next few seconds?"

These are spatial-first questions designed for the physics of the real world, not business logic.

Suresh Gohane

Suresh Gohane

OZ Cortex / AI Stack Lead

AI Runtime & Cortex

“Traditional databases were designed for business data. Spatial engines are designed for the physical world. Different data, different engine, different first principles.”

When Data Never Stops Moving#

Q: You mentioned real-time spatial monitoring. How is that different from regular streaming data?

Streaming data is a well-understood problem in software. The tooling is mature. You have events flowing through a pipeline: a user clicked a button, a transaction was processed, a sensor reported a value. You apply rules, you emit results.

Streaming spatial data is a fundamentally different problem. The entities move. Their relationships change continuously. Zones activate and deactivate based on who is where. The connection between two entities isn't a shared label; it's their physical proximity, which changes 60 times per second.

Consider a simple question: "Which players are in a defensive formation?" A defensive formation isn't defined by individual positions; it's defined by the relationships between positions. Are the four defenders maintaining a line? Is the spacing consistent? Is the line advancing or retreating? These are questions about physical geometry between people, evaluated continuously.

Q: How does the Venue Graph handle this?

Real-time proximity monitoring. Continuous queries that fire when spatial conditions become true, not when you ask for them.

When a customer registers a question ("tell me when the defensive line advances past the halfway mark") the Venue Graph doesn't repeatedly check. It evaluates that condition against every spatial update, running on the GPU at frame rate. The instant the condition becomes true, the customer receives a notification with the full context: which players, what positions, what speeds, what timestamp.

This inversion, from "keep asking" to "get notified," is what makes real-time spatial intelligence possible. You don't ask the Venue Graph what's happening. You tell it what you care about, and it tells you when it happens.

Identity Is Harder Than It Looks#

Q: Let's talk about identity resolution. How hard is it to track 22 players across six camera feeds?

Entity tracking sounds simple. Assign an ID. Follow it. If the entity moves, update its position. Basic state management.

But when six camera feeds overlap and the same player appears in three feeds simultaneously (from different angles, under different lighting, with different levels of obstruction) figuring out "is this the same person?" becomes a spatial reasoning problem, not just a visual one.

Camera one sees player 7 from behind, partially blocked by player 12. Camera three sees the same player from the side, fully visible. Camera five sees a player at the correct position but with a jersey number blurred by motion. Are these the same person? Visual matching says "maybe." Spatial consistency says "yes," because the positions from all three camera views point to the same physical location on the pitch.

Solving Identity Through Space#

Q: So identity resolution happens in the Venue Graph, not in the detection model?

Identity resolution is fundamentally a spatial reasoning problem. The Venue Graph figures out who is who through spatial consistency: if detections from multiple cameras all point to the same physical location with consistent speed and direction, they are the same person, regardless of how different they look on each camera.

This is a critical design decision. Visual matching is fragile; lighting changes, players blocking each other, motion blur, and similar-colored kits all break visual matching. Spatial consistency is robust; a person's physical position doesn't change based on which camera is looking at them. By resolving identity in physical space rather than camera images, we eliminate an entire class of tracking failures.

The Venue Graph maintains an identity map where each person is linked to every camera detection that contributes to their position estimate. This multi-source fusion (weighted by detection confidence, spatial consistency, and how smoothly the movement continues) produces position estimates that are more accurate than any single camera's view. The whole is genuinely greater than the sum of the parts.

Resolving identity through spatial consistency rather than visual similarity is a direct consequence of building the Venue Graph as a spatial-first system. In a traditional tracking system, identity is a visual matching problem. In OZ's architecture, identity is a spatial reasoning problem solved in the physical world. This eliminates failure modes that plague visual tracking systems (similar-colored kits, changing lighting, players blocking each other) because spatial consistency doesn't depend on visual appearance.

The Query That Matters#

Q: If someone were evaluating Arcflow for the early adopter programme, what query should they try first?

The one that cannot run on any other database.

"Given the current spatial configuration of all tracked entities, which entities will enter a specified zone within the next 3 seconds, ranked by arrival time, with the projected spatial state at the moment of entry."

That query requires path projection, zone intersection, time ordering, and spatial estimation, all evaluated continuously, in under a millisecond, against live position updates 60 times per second.

Try that on any existing database with spatial features bolted on. You will either get the wrong answer, get the right answer too slowly, or discover that the question simply cannot be asked.

That is the query that justifies building a spatial engine from scratch. And it is the query that makes the Venue Graph possible as a real-time representation of physical reality, not a log of what happened, but a live, queryable, predictive model of what is happening and what will happen next.

Under one millisecond, 99.9% of the time. Every frame. Every match. That is the reliability target. And that is why Arcflow exists.


This interview is part of the OZ Interview Series, profiling the team building the world model for the physical world.

All InterviewsAll with SureshLearn more about OZ