What If the Physical World Remembered Everything?
The Thesis#
The world model thesis starts with an observation so obvious that most people overlook it: the digital world has been queryable for thirty years. You can ask a database what happened, what is happening, and (increasingly) what will happen. Every digital transaction, every user interaction, every system event is captured, structured, indexed, and available for analysis. The digital world remembers everything.
The physical world remembers nothing.
A football match happens. Twenty-two athletes produce ninety minutes of the most spatially complex motion patterns in any commercial context. And when the final whistle blows, the venue forgets. The next match starts from zero. No accumulated spatial knowledge. No queryable history. No structured representation that downstream systems can consume. Just video files: unstructured pixels that require human eyes to interpret.
That gap (between the queryability of the digital world and the amnesia of the physical world) is what OZ exists to close. We are building the infrastructure that gives physical environments memory, structure, and queryability. We call it the world model. Not because it is a fashionable term, but because it is the precise description of what we produce: a structured, real-time, versioned model of the physical world that any software system can query through a spatial API.
The Input Layer#
Building a world model begins with seeing. You can't model what you can't perceive. And perception at the physical edge (in an outdoor venue, under variable lighting, across six high-resolution camera streams processing enormous volumes of visual data every second) is a fundamentally different challenge than perception in the cloud.
This is Bhagyashree's domain. Her team builds the AI models that transform raw camera footage into structured data: player identities, positions, speeds, body positions, team affiliations. These models run entirely on-site (on the GPU hardware inside our venue nodes) with guaranteed performance. The system sees a player and identifies them faster than a human blink. Every match. Every weather condition. Every lighting scenario.
What makes Bhagyashree's contribution distinctive is her insistence that production AI is a different discipline than research AI. A model that tops the academic leaderboards might fall apart on a rain-soaked pitch under floodlights with water droplets on the lens. Her models are designed for the venue, not the leaderboard. They are leaner, more robust, and they ship, every deployment cycle, validated against real-world conditions at each specific venue.
The perception layer is the world model's sensory system. Without it, the model is blind. With it, every physical event in the venue becomes a structured data point that flows downstream.
The Inference Engine#
The AI models produce raw detections. But detections alone are noise, thousands of data points per second across six camera streams. The world model needs an intelligence engine that transforms detections into understanding: stitching together views from multiple cameras, tracking players over time, classifying events, spotting anomalies. That engine must run in real time, on-site, with guaranteed response times, without any cloud dependency.
This is Suresh's domain. OZ Cortex is the AI runtime he built, a purpose-built engine that coordinates multiple AI models running simultaneously, allocates processing power between them, manages updates, and guarantees that every operation completes on time, every time. Cortex doesn't "try" to meet performance targets. It is architected so that meeting them is a structural property of the system, not a hope.
The difference between an AI demo and AI infrastructure is what happens at 2 AM during the eighty-seventh match of the season, when the temperature has dropped twelve degrees and a model update shipped three hours ago. Infrastructure works. Demos are already asleep.
What Suresh understood before anyone else on the team is that running AI on-site isn't the same as running AI in a data center with less horsepower. It is a different discipline entirely. In the cloud, you solve performance problems by throwing more servers at them. On-site, you have a fixed amount of processing power, a fixed power budget, and a fixed heat limit, and you must engineer the entire system to deliver within those constraints. Cortex embodies that understanding.
The Spatial Representation#
Perception sees the venue. Cortex reasons about what it sees. But the world model isn't complete until that understanding is captured in a structured representation that persists, versions, and answers queries. The Venue Graph is that representation, and Arcflow, the spatial-first graph database underneath it, is the substrate that makes it possible.
This is Suresh's domain. His PhD in AI and Robotics gave him the conviction that traditional databases (even location-aware databases) treat physical space as an afterthought. The physical world demands the opposite: space as the primary organizing principle, with built-in spatial queries. Ask "who is near whom?" or "which players' paths are about to cross?" and get answers instantly.
Gudjon Gudjonsson
Founder and CEO
“A world model isn't a feature. It's the entire point. Everything else (the cameras, the compute, the graph) exists to make the model more complete.”
Arcflow is purpose-built for speed; queries return in under a millisecond. It is currently in an early adopter programme before its full open-source release, because Suresh and I agree that the data layer for physical-world intelligence should be a standard, not a proprietary lock-in. OZ's moat is the full vertical stack above the database: the hardware, the AI models, the operations, the venue relationships. Nobody replicates that by cloning a repository.
The Venue Graph built on Arcflow is what makes the physical world queryable. "Which players are within passing distance of the ball carrier and moving into open space?" "Has this formation been seen before, and what happened next?" "Given how the players are positioned right now, what is likely to happen in the next three seconds?" These are the questions the world model can answer, in real time, on-site, sixty times per second.
The Physical Platform#
Here is the part of the thesis that most AI companies skip, and it is the part that makes the entire system possible: everything I have just described (the AI vision models, the intelligence engine, the spatial graph) runs on physical hardware that lives outdoors, permanently installed on stadium gantries, across every season, in every weather condition. The hardware isn't a commodity input. It is the foundation.
This is Rushikesh's domain. Eighteen years in hardware engineering. His team designs sealed enclosures where the outer shell itself dissipates heat, with no fans, no air intake, no openings where water or dust can enter. Custom power systems that handle the electrical surges from stadium floodlights. Mounting systems engineered for the specific structure of each venue.
The OZ VI Venue node packs the kind of processing power that usually fills a data center rack into a single weatherproof box rated for permanent outdoor deployment. That processing power defines the performance ceiling that everything upstream (AI models, intelligence engine, spatial queries) operates within. When Bhagyashree designs a model, she knows exactly the thermal headroom available. When Suresh tunes the intelligence engine, he knows the power budget. Vertical integration isn't a strategy slide. It is the Head of AI calling the Head of Hardware and solving constraints together, in the same room, without a vendor intermediary.
Hardware creates the switching cost that makes the business model defensible. Software can be replaced in a quarter. A physically installed, calibrated, seasonally proven edge node with months of accumulated Venue Graph data can't. That's the moat that capital alone can't replicate.
The Operational Layer#
The final layer is the one that makes the world model repeatable. Technology that works at one venue is a prototype. Technology that works at fifty venues with consistent quality and declining marginal cost is infrastructure. The difference is operations.
This is Audur's domain. Venue Ops as Code, the principle that every aspect of a deployment is codified in automated playbooks. Installation sequences, acceptance criteria, monitoring rules, escalation procedures. The playbook for venue fifty incorporates every learning from venues one through forty-nine, not because someone remembered to update a wiki, but because the system captured those learnings automatically.
Audur understood something before anyone else: the playbook, not the person, is the unit of scale. A small team operating what traditionally requires a much larger organization isn't a staffing constraint. It's an architectural outcome. Every playbook execution that runs without human intervention is operational leverage. Every performance issue that is automatically diagnosed before it reaches a human is operational leverage. The leverage compounds with every deployment.
The world model isn't a database. It isn't a vision system. It is the closed loop between five co-designed layers (hardware, AI vision, intelligence engine, spatial representation, and operations) where each layer makes every other layer more capable with every deployment.
The Integrated Stack#
That is the world model. AI vision, intelligence engine, spatial representation, physical platform, operations. One integrated stack. One company.
The reason this must be one company is that the world model's value emerges from the interfaces between layers, not from any single layer in isolation. The AI models improve because the Venue Graph provides real-world context for training. The Venue Graph improves because Cortex delivers higher-quality intelligence. Cortex operates reliably because the hardware provides a known, stable performance ceiling. The hardware deploys repeatably because the operational playbooks eliminate variance. And the playbooks improve because every deployment adds data to the Venue Graph.
It is a closed loop. Cut any edge in that loop (outsource the hardware, use a third-party database, rely on cloud-based AI processing) and the compounding stops. The flywheel breaks. You have components instead of a system.
Every investor asks: what is the defensible advantage? The answer isn't any single capability. It is the integrated system, and the fact that replicating it requires simultaneously solving hardware engineering, on-site AI vision, spatial data infrastructure, real-time intelligence, and venue operations. No amount of capital lets you skip that simultaneous requirement. You must build all five layers, and they must work together.
We started where the physics is hardest: live sports, where twenty-two unpredictable entities move at high speed under broadcast-grade quality requirements with zero tolerance for downtime. Every adjacent vertical (broadcasting, CCTV, defense) is a relaxation of at least one constraint. The same platform. The same API. The same Venue Graph. New playbooks, new configurations, same compounding economics.
A world model isn't a feature. It's the entire point. Everything else (the cameras, the compute, the graph, the enclosures, the playbooks) exists to make the model more complete. And every venue we deploy, every match we capture, every query we answer makes it harder for anyone else to build what we have already built. That is the thesis. It is simple to state and extraordinarily difficult to execute. Which is exactly why it is worth doing.