Edge-First AI: Why Sub-Second Latency Is the Only Metric That Matters

Article March 4, 2026

Share this post

In most AI products, the model is the product. At OZ, latency is the product.

When a ball changes direction, when a person enters a restricted zone, when a camera needs to track an unpredictable trajectory, the system has milliseconds to respond. Not seconds. Not "near real-time." Milliseconds.

That constraint defines the architecture.

The non-negotiable boundary#

OZ splits its computing stack along one clear boundary: what must happen in real time runs on the venue edge. Everything else runs in the cloud.

On-venue edge (time-critical):

Perception: multi-camera spatial tracking, entity detection, scene understanding
Cueing: AI-driven camera directives for robotic capture heads
Control: deterministic control loops governing capture priorities and zone policies
Spatial output: structured data delivered to downstream systems via the Venue Graph
Recovery: self-healing capture loops that restart failed components without external intervention

Cloud (improvement-critical):

Model training: new perception models trained on aggregated, anonymized telemetry
Fleet analytics: cross-venue performance comparison and trend detection
Playbook optimization: automated refinement of commissioning and operating procedures
Reporting: long-term dashboards, SLO compliance history, and capacity planning

This is not a caching strategy or a performance optimization. It is a product architecture decision. The venue operates autonomously at the edge. The cloud makes the next deployment better.

Why cloud-first AI fails for physical operations#

Cloud-first AI architectures assume three things:

Network connectivity is reliable
Round-trip latency is acceptable
The processing window is flexible

In physical venue operations, all three assumptions fail:

Network is not reliable. Venues are physical environments. Construction, weather, crowd density, and infrastructure age all affect connectivity. A venue that depends on cloud inference stops working when the network degrades.

Round-trip latency is not acceptable. A cloud inference call adds 50-200ms of network latency on top of processing time. For a robotic camera that needs to track a fast-moving entity, that delay means the subject has already left the frame.

The processing window is not flexible. In live operations, there is no "retry later." The moment passes. The data is stale. The capture opportunity is lost. Real-time means real-time, not "fast enough most of the time."

The timing chain#

At OZ, we measure and publish the full timing chain from photon capture to spatial output:

Photon to sensor: light hits the sensor array
Sensor to perception: raw frames processed by edge GPU
Perception to fusion: multi-camera signals combined into spatial state
Fusion to cueing: AI generates camera directives
Cueing to execution: robotic capture heads respond
State to Venue Graph: structured spatial output delivered via API

The published target: p99 latency ≤120ms end-to-end. Measured per venue, published per deployment.

This timing chain is the product specification. Not a benchmark. Not an aspiration. A contractual commitment.

Network outage resilience#

The edge-first architecture provides a critical operational guarantee: a network outage does not interrupt the spatial layer.

When connectivity to the cloud drops:

Perception continues: all models run on local GPU
Cueing continues: camera directives execute from local state
Control continues: policies and priorities enforce from cached configuration
Spatial output continues: downstream systems on the venue network receive uninterrupted data
Telemetry buffers: operational data queues locally and syncs when connectivity returns

Zero data loss. Zero operational interruption. The venue does not know the cloud is unreachable.

What the cloud contributes#

The cloud isn't irrelevant; it's where the system improves:

Model improvement: Every venue generates edge cases that the current models handle imperfectly. Training pipelines aggregate anonymized data across the network to produce better models that deploy to all venues simultaneously.

Playbook refinement: Commissioning telemetry from every deployment feeds the operational playbook. The cloud analyzes patterns (which steps take longest, which environments cause calibration drift, which failure modes recur) and updates the procedures.

Fleet intelligence: Cross-venue comparison reveals performance outliers. If one venue consistently achieves lower MTTR, the cloud identifies the configuration difference and propagates it.

The cloud makes the network smarter over time. The edge makes each venue reliable right now.

Infrastructure, not software#

The edge-first architecture is why OZ is infrastructure, not software. Software runs in someone else's compute environment. Infrastructure runs in the physical environment where the work happens.

When your AI processes at the edge, you control the full execution path. When your AI processes in the cloud, you control a request-response cycle.

That is the difference between a product that observes and a system that executes.

Edge-First AI: Why Sub-Second Latency Is the Only Metric That Matters

Article March 4, 2026

Share this post

In most AI products, the model is the product. At OZ, latency is the product.

That constraint defines the architecture.

The non-negotiable boundary#

OZ splits its computing stack along one clear boundary: what must happen in real time runs on the venue edge. Everything else runs in the cloud.

On-venue edge (time-critical):

Perception: multi-camera spatial tracking, entity detection, scene understanding
Cueing: AI-driven camera directives for robotic capture heads
Control: deterministic control loops governing capture priorities and zone policies
Spatial output: structured data delivered to downstream systems via the Venue Graph
Recovery: self-healing capture loops that restart failed components without external intervention

Cloud (improvement-critical):

Model training: new perception models trained on aggregated, anonymized telemetry
Fleet analytics: cross-venue performance comparison and trend detection
Playbook optimization: automated refinement of commissioning and operating procedures
Reporting: long-term dashboards, SLO compliance history, and capacity planning

This is not a caching strategy or a performance optimization. It is a product architecture decision. The venue operates autonomously at the edge. The cloud makes the next deployment better.

Why cloud-first AI fails for physical operations#

Cloud-first AI architectures assume three things:

Network connectivity is reliable
Round-trip latency is acceptable
The processing window is flexible

In physical venue operations, all three assumptions fail:

The timing chain#

At OZ, we measure and publish the full timing chain from photon capture to spatial output:

Photon to sensor: light hits the sensor array
Sensor to perception: raw frames processed by edge GPU
Perception to fusion: multi-camera signals combined into spatial state
Fusion to cueing: AI generates camera directives
Cueing to execution: robotic capture heads respond
State to Venue Graph: structured spatial output delivered via API

The published target: p99 latency ≤120ms end-to-end. Measured per venue, published per deployment.

This timing chain is the product specification. Not a benchmark. Not an aspiration. A contractual commitment.

Network outage resilience#

The edge-first architecture provides a critical operational guarantee: a network outage does not interrupt the spatial layer.

When connectivity to the cloud drops:

Perception continues: all models run on local GPU
Cueing continues: camera directives execute from local state
Control continues: policies and priorities enforce from cached configuration
Spatial output continues: downstream systems on the venue network receive uninterrupted data
Telemetry buffers: operational data queues locally and syncs when connectivity returns

Zero data loss. Zero operational interruption. The venue does not know the cloud is unreachable.

What the cloud contributes#

The cloud isn't irrelevant; it's where the system improves:

Fleet intelligence: Cross-venue comparison reveals performance outliers. If one venue consistently achieves lower MTTR, the cloud identifies the configuration difference and propagates it.

The cloud makes the network smarter over time. The edge makes each venue reliable right now.

Infrastructure, not software#

The edge-first architecture is why OZ is infrastructure, not software. Software runs in someone else's compute environment. Infrastructure runs in the physical environment where the work happens.

When your AI processes at the edge, you control the full execution path. When your AI processes in the cloud, you control a request-response cycle.

That is the difference between a product that observes and a system that executes.