What If a Database Could Predict Where Every Player Will Be?

Article March 1, 2026

Share this post

Somewhere in the system, a database stores the fact that player 7 is at position (34.2, 22.1). It is wrong. Not broken, but wrong in a way every database is wrong about the physical world: the true position is 30 centimeters away, the observation carries a confidence score derived from three cameras with different sight lines, and the database dropped all of that uncertainty the moment it wrote the row. Downstream, a broadcast graphic places a name tag above the player's head using false certainty. An offside check trusts a measurement that the sensors themselves do not trust. Suresh Gohane, OZ's Spatial Systems Lead, has spent two years thinking about what a database would need to look like if it actually respected the physics of the world it claims to model (partial observations, causal chains, predictions conditioned on hypothetical actions) and the answer became Arcflow's deeper roadmap. "The venue was the first proof. It was never the final scope."

Beyond the Venue Graph#

Q: Arcflow powers the Venue Graph today. But reading the architecture, it feels like it was designed for something much larger. What is Arcflow really building toward?

The Venue Graph is a world model for a single physical venue. It answers: what is happening here, right now, and how is it changing? That is already more than any existing database was designed to do. But the question I have been thinking about for the last two years is: what would a database need to look like if it were the backbone for understanding any physical environment?

Not just a football pitch. A logistics hub. A construction site. A port. A city block. Any space where entities exist, move, interact, and produce consequences that matter.

The answer is not "a bigger venue graph." The answer requires capabilities that go beyond tracking and querying positions. It requires the database to understand three things that existing systems do not handle at all.

Q: What are those three things?

Partial observability. Causal structure. And action-conditioned prediction.

Partial Observability#

Q: Start with partial observability. What does that mean for a database?

Every existing database assumes it has complete information. When you write a row, the database trusts that the row is true. When you query, you get back what was written. The contract is: what is stored is what is known.

The physical world does not work that way. A camera sees part of the pitch. Occlusion hides part of what the camera could see. Sensor noise adds uncertainty to what the camera does see. Inference models produce estimates with confidence scores, not ground truth.

When the Venue Graph says "player 7 is at position (34.2, 22.1)," that is not a fact in the way a database row is a fact. It is an observation with a confidence score, derived from multiple sensor inputs, each with their own uncertainty, fused through a probabilistic pipeline. The true position might be 30 centimeters away. The system knows this. The database should know this too.

Today, most systems discard that uncertainty at the database boundary. The perception pipeline outputs a position estimate with a confidence score. The database stores the position and drops the confidence. Downstream queries operate on false certainty.

What Arcflow is building toward is a data model where observations carry their provenance: which sensors contributed, what confidence each sensor reported, how the fusion weighted them, and what the residual uncertainty is. When you query "where is player 7?", the answer is not a point. It is a probability distribution. And queries that depend on that position ("is player 7 in the penalty area?") can propagate that uncertainty through the answer.

Q: Why does this matter in practice?

Because decisions have different tolerance for uncertainty. A camera assignment (which camera should frame this player) tolerates 50 centimeters of position error. A broadcast graphic overlay (placing a name tag above a player's head) tolerates 10 centimeters. An offside decision tolerates none.

If the database knows the observation uncertainty, it can tell downstream consumers not just the answer, but how confident the answer is. The camera assignment system proceeds. The graphic overlay proceeds with a wider margin. The offside system flags that the observation uncertainty exceeds its tolerance threshold. Different consumers, same query, different confidence requirements, all handled at the data layer instead of duplicated in every application.

Partial observability is not an edge case in physical-world data; it is the default condition. Every sensor has noise. Every model has uncertainty. A database that treats observations as ground truth is lying to every downstream consumer. Arcflow's roadmap treats observation confidence as a first-class dimension alongside space and time.

Causal Structure#

Q: What does causal structure mean in a graph database?

Traditional graph databases store relationships. "Player A passed to Player B." That is a fact. It happened. You can query it.

But the physical world is not just a record of what happened. It is a system of cause and effect. Player A passed to Player B because the passing lane was open, because Player C's defensive positioning created the gap, because the formation shifted 8 seconds earlier. The pass is not an isolated event. It is the consequence of a spatial configuration that evolved over time.

Causal structure means the database can represent not just "what happened" but "what caused what." The formation shift caused the gap. The gap enabled the pass. The pass changed the game state. These are not annotations on events; they are queryable relationships in the graph.

Q: Why can't you model that with a regular graph database?

You can model it. You can create "caused_by" edges between event nodes. What you cannot do is query it efficiently, because traditional graph databases do not understand temporal ordering as a constraint on traversal.

When you traverse a causal chain, time flows in one direction. A cause must precede its effect. A traditional graph traversal follows edges regardless of temporal direction. It will happily traverse backward in time, returning "causes" that happened after the "effect." You have to filter temporally invalid paths in application code, after the traversal, which means the database does far more work than necessary.

Arcflow's temporal model understands that time is directional. Causal traversals are constrained to respect temporal ordering by construction. The query planner knows that when traversing a "caused_by" chain, it only needs to consider nodes with earlier timestamps. This is not a filter applied after traversal; it is a constraint that prunes the search space before traversal begins.

Suresh Gohane

OZ Cortex / AI Stack Lead

AI Runtime & Cortex

“The physical world runs on cause and effect. A database for the physical world should understand both.”

Action-Conditioned Prediction#

Q: You mentioned action-conditioned prediction. That sounds like it belongs in an AI model, not a database.

The distinction between "database" and "model" is dissolving. And it should.

Consider the query: "If the defensive line holds its current position for 3 more seconds, which attacking players will be in a scoring position?" That is a prediction conditioned on an action, the defensive line holding. It requires the database to project entity trajectories forward in time, under a specified constraint, and evaluate spatial predicates against the projected state.

Today, that query requires exporting current state from the database, running a simulation in external code, and evaluating the spatial predicates outside the database. The round-trip is slow, the integration is fragile, and the simulation has no access to the database's full temporal history.

What we are building toward is a query interface where projections and hypotheticals are native operations. "Given the current state, project forward N seconds under constraint X, and evaluate predicate Y against the projected state." The database has all the information it needs: current positions, velocity vectors, historical patterns, spatial constraints. The projection should happen where the data lives.

Q: How far away is that?

The foundation is in place. Arcflow already stores trajectory history and velocity vectors. It already evaluates spatial predicates in real time. Path projection (extrapolating current trajectories forward) is an active workload in production today for camera cueing.

The step from "project trajectories and evaluate spatial predicates" to "project trajectories under specified constraints and evaluate spatial predicates against hypothetical state" is an extension of existing capability, not a new architecture. The hard part (sub-millisecond spatial evaluation against continuously updating state) is already solved.

The research question is how to express these queries in a way that is both powerful enough for complex physical reasoning and simple enough that developers can use it without a PhD. That is the query language design problem we are working on now.

The Agent-Native Surface#

Q: Arcflow's command interface is designed for AI coding agents, not just human developers. Why?

Because the next generation of developers building on spatial data will not write queries by hand. They will describe intent to an agent, and the agent will compose and execute the query.

This changes what a database interface needs to look like. A human developer reads documentation, writes a query, runs it, reads the output, adjusts. An AI agent needs structured command inputs, deterministic outputs, machine-readable error codes, and evidence artifacts that prove the query did what it was supposed to do.

We designed the command surface so that every operation (run, check, benchmark, evidence) produces structured output that an agent can parse, evaluate, and act on without human interpretation. The agent does not need to understand Arcflow's internals. It needs to understand the contract: this input produces this output shape, this error code means this specific failure, this evidence artifact proves this specific claim.

Q: Is this just a JSON API?

It is deeper than that. The CLI is the operational contract. Every capability that Arcflow claims must be runnable, measurable, and provable through the command line. If you cannot benchmark it from the CLI, it is not a production capability. If you cannot produce evidence of correctness from the CLI, the claim is not verified.

This discipline came from our own experience building with AI agents internally. When an agent runs a benchmark and gets back a wall of unstructured text, it cannot reliably extract whether the benchmark passed or failed. When it gets back a structured artifact with explicit pass/fail fields, latency percentiles, and comparison baselines, it can make confident decisions about what to do next.

The database becomes a tool the agent can reason about, not just a service it calls.

Designing for AI agents as first-class consumers is not about adding a JSON endpoint. It is about making every operational claim machine-verifiable. The agent-native command plane ensures that every benchmark, every query result, and every system health check produces structured evidence that autonomous systems can evaluate without human interpretation.

The Open-Source Thesis#

Q: Why open-source a database you spent years building?

Because the physical world needs a shared data layer, and a proprietary one will not become that.

Think about what happened with relational databases. SQL became the standard not because one company controlled it, but because every company could build on it. The applications were proprietary. The data layer was shared. That is why SQL survived every technology cycle for fifty years.

Spatial-temporal data for the physical world is at the same inflection point. The applications (autonomous vehicles, robotics, venue intelligence, logistics, defense) are all proprietary. But they all need the same spatial primitives: entities in space, relationships between them, events unfolding in time. If every company builds its own spatial data layer, the ecosystem fragments. If one company controls the shared layer, adoption stalls.

Arcflow open-source is the bet that the spatial data layer should be shared infrastructure. OZ's advantage isn't the engine; it's everything above the engine. The perception models, the hardware, the operational playbooks, the deployment relationships. Those compound with every venue deployed. The database becoming a standard makes those advantages more valuable, not less.

Q: When does the open-source release happen?

When the early adopter cohort has validated the API surface against real workloads. We are not going to release a research prototype and hope the community figures out the sharp edges. The teams in the early adopter programme are running production-adjacent workloads against Arcflow right now. Their feedback is shaping the query interface, the error contracts, and the GPU acceleration paths.

When the API surface is stable enough that early adopters are not hitting breaking changes, and the performance characteristics are documented with real-workload evidence rather than synthetic benchmarks, that is when the public release ships. We would rather be late and trusted than early and fragile.

What If a Database Could Predict Where Every Player Will Be?

Article March 1, 2026

Share this post

Beyond the Venue Graph#

Q: Arcflow powers the Venue Graph today. But reading the architecture, it feels like it was designed for something much larger. What is Arcflow really building toward?

Not just a football pitch. A logistics hub. A construction site. A port. A city block. Any space where entities exist, move, interact, and produce consequences that matter.

Q: What are those three things?

Partial observability. Causal structure. And action-conditioned prediction.

Partial Observability#

Q: Start with partial observability. What does that mean for a database?

Q: Why does this matter in practice?

Causal Structure#

Q: What does causal structure mean in a graph database?

Traditional graph databases store relationships. "Player A passed to Player B." That is a fact. It happened. You can query it.

Q: Why can't you model that with a regular graph database?

Suresh Gohane

OZ Cortex / AI Stack Lead

AI Runtime & Cortex

“The physical world runs on cause and effect. A database for the physical world should understand both.”

Action-Conditioned Prediction#

Q: You mentioned action-conditioned prediction. That sounds like it belongs in an AI model, not a database.

The distinction between "database" and "model" is dissolving. And it should.

Q: How far away is that?

The Agent-Native Surface#

Q: Arcflow's command interface is designed for AI coding agents, not just human developers. Why?

Because the next generation of developers building on spatial data will not write queries by hand. They will describe intent to an agent, and the agent will compose and execute the query.

Q: Is this just a JSON API?

The database becomes a tool the agent can reason about, not just a service it calls.

The Open-Source Thesis#

Q: Why open-source a database you spent years building?

Because the physical world needs a shared data layer, and a proprietary one will not become that.

Q: When does the open-source release happen?