ArcFlow
Company
Managed Services
Markets
  • News
  • LOG IN
  • GET STARTED

OZ brings Visual Intelligence to physical venues, a managed edge layer that lets real-world environments see, understand, and act in real time.

Talk to us

ArcFlow

  • World Models
  • Sensors

Managed Services

  • OZ VI Venue 1
  • Case Studies

Markets

  • Sports
  • Broadcasting
  • Robotics

Company

  • About
  • Technology
  • Careers
  • Contact

Ready to see it live?

Talk to the OZ team about deploying at your venues, from a single pilot match to a full regional rollout.

Schedule a deployment review

© 2026 OZ. All rights reserved.

LinkedIn
ArcFlow Docs
Start
  • Quickstart
  • Installation
  • Bindings
  • Platforms
  • Get Started
Concepts
  • World Model
  • Graph Model
  • Evidence Model
  • Observations
  • Confidence & Provenance
  • Proof Artifacts & Gates
  • SQL vs GQL
  • Graph Patterns
  • Parameters
  • Query Results
  • Persistence & WAL
  • Snapshot-Pinned Reads
  • Error Handling
  • Execution Models
  • Causal Edges
  • Adapter Discipline
  • Time Decay
  • Layers
  • 1. Perception Lake
  • 2. World Graph
  • 3. Query Engine
  • 4. Live Surface
  • 5. Event Bus
  • 6. Behavior Engine
  • 7. Algorithm Library
WorldCypher
  • Overview
  • Statements
  • MATCH
  • WHERE
  • RETURN
  • OPTIONAL MATCH
  • CREATE
  • SET
  • MERGE
  • DELETE
  • REMOVE
  • Composition
  • WITH
  • UNION
  • UNWIND
  • CASE
  • Schema
  • Schema Overview
  • Indexes
  • Constraints
  • Functions
  • Built-in Functions
  • Aggregations
  • Procedures
  • Shortest Path
  • EXPLAIN
  • PROFILE
  • Temporal Queriesfacet
  • Spatial Queriesfacet
  • Algorithmsfacet
  • Triggers
Capabilities
  • Live Queries
  • Vector Search
  • Trusted RAG
  • Spatial Knowledge
  • Temporal
  • Behavior Graphs
  • Graph Algorithms
  • Skills
  • CREATE SKILL
  • PROCESS NODE
  • REPROCESS EDGES
  • Sync
  • Programs
  • GPU Acceleration
  • Agent-Native
  • MCP Server
  • Event Sourcing
  • Intent Relay
  • Event Bus
Use Cases
  • Agent Tooling
  • Trusted RAG
  • Knowledge Management
  • Behavior Graphs
  • Autonomous Systems
  • Physical AI
  • Digital Twins
  • Robotics & Perception
  • Sports Analytics
  • Grounded Neural Objects
  • Fraud Detection
Walkthroughs
    Guides
  • Agent Integration
  • Building a World Model
  • Modeling a Social Graph
  • Build a RAG Pipeline
  • Using Skills
  • Behavior Graphs
  • Swarm & Multi-Agent
  • Fleet Coordination
  • Migrate from Cypher / Neo4j
  • From SQL to GQL
  • Filesystem Workspace
  • Data Quality
  • Code Intelligence
  • Scale Patterns
  • v0.7 → v0.8 Lakehouse Fast-Path
  • Tutorials
  • Knowledge Graph
  • Entity Linking
  • Vector Search
  • Graph Algorithms
  • Recipes
  • CRUD
  • Multi-MATCH
  • MERGE (Upsert)
  • Full-Text Search
  • Batch Projection
  • Multi-Source Observation
  • Sports Analytics
Operations
  • CLI
  • REPL Commands
  • Snapshot & Restore
  • Filesystem Projection
  • Server Modes & PG Wire
  • Persistence (ops)
  • Import & Export
  • Deployment
  • Daemon (UDS)
  • Architecture
  • Engine Architecture
  • Cloud Architecture
  • Sync Protocol (Deep Dive)
  • World Graph Substrate (Preview)
Reference
  • TypeScript API
  • Glossary
  • Naming & Domain Map
  • Data Types
  • Operators
  • Error Codes
  • GQL Reference
  • Known Issues
  • Versioning
  • Licensing
  • Conformance
  • GQL Conformance
  • openCypher TCK
  • Extension Regressions
GQL Conformance
  • Conformance Dashboard
  • openCypher TCK Results
  • Extension Regressions
GQL Features
  • MATCH Basic
  • CREATE Nodes Edges
  • SET REMOVE Properties
  • DELETE Detach DELETE
  • RETURN WITH WHERE
  • Order BY Limit Skip
  • Order BY Nulls First Last
  • UNWIND
  • Aggregate Functions
  • OPTIONAL MATCH
  • Variable Length Paths
  • Label OR AND NOT Expressions
  • Label Wildcard
  • Quantified Path Sugar
  • Path Modes Walk Trail Simple Acyclic
  • Shortest Path Variants
  • IS Labeled Predicate
  • Element ID Function
  • IS Type Predicate
  • Binary Literals
  • Line Comments Solidus
  • Line Comments Minus
  • GQLSTATUS Result Codes
  • GQL Error Code Mapping
  • Transaction Control Syntax
  • SET Session
  • Conditional Execution WHEN THEN ELSE
  • RETURN NEXT Pipeline
  • Primary Key Constraint
  • Unique Constraint
  • Deterministic MERGE Via PK
  • Undirected Edge MATCH
  • Cast Type Conversion
  • GQL Directories
  • Multiple Labels Per Node
  • GQL Flagger
  • NEXT Linear Composition
  • Cardinality Function
  • INT64 BIGINT Type Names
  • FLOAT64 Double Type Names
  • Log10 Log2 Functions
  • Trim Leading Trailing Both
  • FILTER Clause
  • LET Statement
  • Group BY Explicit
  • EXCEPT SET Operations
  • INTERSECT SET Operations
  • ALL Different Predicate
  • Same Predicate
  • Property Exists Function
  • Path Variable Binding
  • USE Graph Clause
  • FOR IN List
  • Typed Temporal Literals
  • Session SET Value Params
  • Typed List Annotations
  • arcflow.cosine() function
  • arcflow.embed() function
  • arcflow.similar() procedure
  • arcflow.graphrag() procedure
ArcFlow Extensions
  • LIVE Queries
  • Triggered Write-Back Views
  • Evidence Algebra
  • Relationship Skills
  • AI Function Namespace
  • Graph Embedding Algorithms
  • ASOF JOIN
  • Durable Workflows
  • Incremental Z-Set Engine
  • GPU GraphBLAS
  • Triggers
  • HNSW Vector Index
  • Extensions Moat

World Graph Substrate

This page is an engine-architecture deep dive. It describes the substrate the World Graph layer is built on — the module structure, the type vocabulary, the storage hierarchy, the addressing scheme, and the on-disk format. The substrate cut as the engine's 0.8.0 release; the type vocabulary is the public surface, and the runtime that animates it is real bytes on disk.

When an external surface is documented elsewhere (DDL, procedures, the SDK), this page links to it. When a substrate concept has not yet surfaced through a user-facing API, this page describes the concept at the architectural level and does not invent one. Items still queued behind the substrate cut (reader bodies, executor wiring, planner-side rewrites) are called out in Status at the bottom — they are not described as live.

Section 1 — arcflow.worldgraph module structure#

The World Graph layer is implemented as a single top-level module, arcflow_core::worldgraph::*, replacing the previously scattered store / mvcc / mmap-store / WAL modules. The decomposition follows two patterns.

Six bounded-capability submodules, each owning what it does:

SubmoduleOwns
catalogThe Iceberg-shaped manifest reader. The boundary between this layer and the Perception Lake.
topologyCSR adjacency. Immutable, GPU-uploadable.
nodesLow-cardinality mutable node tables.
walThe WAL durability contract. Crash-replayable.
mmapRead-only mmap path; column files; cache coherency.
schemaTyped CREATE NODE LABEL registration, including the VIRTUAL variant.

One substrate primitive layer (worldgraph::io), owning how bytes move:

SubmoduleOwns
io::segmentSegment containers + extents; checksums.
io::stripeAppend-stripe writer (pwrite + fsync + atomic rename). Mmap is not used for writes.
io::cacheThe Memory Governor — heat scores, admission, auto-prefetch, tier transitions.
io::wal_storeWAL segment manager; group commit; full-fsync on platforms that need it.
io::manifest_txnAtomic manifest commit protocol (two-rename).
io::object_cacheUniversal Parquet reader (local + remote partitions).
io::compactionCompaction scheduler — levels, bandwidth cap, read-amp limit.
io::platformmacOS / Linux / Windows storage primitive abstraction.
io::metricsPage faults; resident vs dirty bytes; read amp.

The six bounded capabilities call into io::* rather than the OS directly. The split exists because mmap policy, fsync policy, cache governance, manifest-commit protocol, and segment-container layout otherwise leak into every capability and re-emerge as duplicated, drifting policy. Concentrating them in io/ gives the substrate one place to enforce storage doctrine.

Two structural disciplines apply:

  • PAT-0046 — Path = capability. Every submodule name describes the capability it owns. No utils.rs, no helpers.rs.
  • PAT-0047 — mod.rs is a navigable index. Module roots carry submodule declarations + module-level documentation only; no impl blocks, no helper functions, no flat re-exports beyond what callers strictly need.

The substrate is internal — every submodule is pub(crate). External consumers reach the World Graph through the SDK and FFI surfaces, not through worldgraph::* directly. The atomic public-surface flip lands in the doctrinal sweep that closes the substrate cut.

Section 2 — Virtual Labels and the Lakehouse–Graph split#

The substrate's central doctrinal concept is that a node class lives in exactly one of two places:

  • Owned — rows live in the World Graph's own stripe store. Mutable, low-cardinality, queryable by property and traversal.
  • Virtual — rows live in a Lakehouse (Iceberg or Parquet-glob); the engine holds the typed schema, the catalog pointer, and the topology. Immutable observations, high-cardinality, read via columnar scan.

NodeKind is the tagged value that distinguishes them. The DDL admits both forms:

-- Owned class — rows live in the engine
CREATE NODE LABEL Player (name STRING, level INT);
 
-- Virtual class — rows live in a Lakehouse partition
CREATE NODE LABEL Frame (ts TIMESTAMP, x DOUBLE)
  VIRTUAL FROM PARTITION 's3://nfl-feed/frames/{date}/{game}.parquet';

A Virtual label produces a VirtualLabelEntry { label, partition_pattern, schema_ref, resolver_kind } row in the catalog. ResolverKind is one of Iceberg, ParquetGlob, or Custom. The substrate registers the contract; the resolvers themselves wire through as the substrate cuts.

The split is governed by three mechanical rules (see World Graph):

  • R1 — Identity owned by the Graph. Every node, Owned or Virtual, has a stable ID and a Graph-resident resolver.
  • R2 — Mutability bright-line. Mutable → Owned; immutable observation → Virtual.
  • R3 — Topology owned by the Graph. Edges live in the Graph's CSR adjacency, even when both endpoints are Virtual.

The architectural consequence is that the substrate does not need to ingest a Lakehouse partition's rows into engine RAM to make them queryable. The schema and adjacency suffice; row access pushes down to the catalog at query time. The query engine's predicate-pushdown rewriter for MATCH (:VirtualLabel ...) patterns lands as a follow-on; until it does, queries against Virtual labels return a typed QueryError::VirtualLabelNotYetQueryable.

Section 3 — Six-tier residency, nine-state classification#

The substrate is storage-hierarchy-aware. Every fragment of data is classified into one of six storage tiers, with a nine-state finer classification that the Memory Governor uses to decide prefetch, eviction, and fetch-on-miss.

Six tiers (TierBudget — operator-set byte budgets):

TierWhat it is
L0 — GPU VRAMCompute-staging only. Not durable. Fragments transit through L0 to be consumed by GPU kernels.
L1 — CPU RAMMain system memory. Decompressed Arrow buffers; arena allocations; structured state.
L2 — OS page cacheKernel-managed subset of L1 holding pages of mmap'd files. Same physical bytes as L1; semantically distinct (kernel evicts; the engine doesn't).
L3 — NVMe SSDHot durable tier on local NVMe. ARC1 stripes.
L4 — HDD / cold localCold durable tier. Zstd-compressed Parquet.
L5 — object storageRemote durable tier (S3 / GCS / Azure). Iceberg-shaped Parquet.

Nine states (ResidencyClass):

L0GpuResident
L1CpuPinned       L1CpuHot        L1CpuWarm
L3SsdLocal
L4HddLocal
L5RemoteCached    L5RemoteStreamed    L5RemoteCold

The nine-state granularity exists because the substrate dispatches different policies based on the finer state — pinned memory is not evicted; hot memory may be demoted before warm; remote-cached has different fetch latency than remote-streamed.

The Memory Governor (worldgraph::io::cache) is the substrate's active control. It owns fragment placement based on heat scores, the operator-set TierBudget, and the per-query SpeedLaneHint. Capability and current placement are reported through the engine's metrics surface; the substrate never silently downgrades a fragment's tier — every transition is observable, and a fragment that cannot be promoted to satisfy a SpeedLaneOnly query returns FragmentError::BelowSpeedLane instead of stalling.

Section 4 — oz:// brand-level URI scheme#

The substrate exposes one URI scheme for every addressable resource in a workspace. It is brand-level — oz://, not arcflow:// — so the same URI shape works across all surfaces: the engine, the daemon, the fsspec Python binding, federation peers.

Six variants:

URIResolves to
oz://workspaceThe workspace root.
oz://snapshot/<hex-digest>A pinned snapshot.
oz://label/<name>A node label.
oz://edge/<name>An edge label.
oz://catalogThe Iceberg-shaped catalog manifest.
oz://partition/<content-addressed-digest>A partition file.

Parsing is strict. Invalid forms produce a typed OzUriError — MissingScheme, EmptyAuthority, UnknownAuthority, MissingPath, UnexpectedPath. The substrate never silently coerces a malformed URI into the wrong variant.

Variants are additive. New addressable resource kinds extend the enum; renaming or removing a variant is a major-version concern. The pattern — one URI scheme, multiple resolvers — mirrors the precedent set by other "one namespace over heterogeneous backends" designs.

At this step the parser, the OzUri enum, and its display contract are in the engine source. The fsspec Python binding, CLI flag wire-up, FFI resolver, and catalog-resolver dispatch land as the substrate cuts.

Section 5 — ARC1 on-disk format + LSM compaction shape#

The substrate's hot-tier on-disk format is ARC1, distinct from the cold-tier Parquet that flows through the Iceberg manifest. ARC1 is a sequential append-only stripe format designed for the substrate's write path.

The file header is an 8-byte magic followed by a version byte, in the style of well-known binary formats (PNG, JPEG-XL, Parquet) — a stable signature any tool can identify without reading further. The append-only stripe writer (worldgraph::io::stripe::StripeWriter) targets ARC1; the read path is mmap-based with explicit cache governance.

Compaction follows an LSM shape — seven levels with 10× fanout and 64 MiB target file size, modelled on the RocksDB level policy. Per-level bandwidth caps and a per-trigger scheduler keep compaction from interfering with the foreground write path. The policy types are frozen; the scheduler body lands progressively.

Two on-disk formats by design:

  • ARC1 — hot tier. Append-only stripes, mmap-read, LSM-compacted, designed for the substrate's write path.
  • Parquet — cold tier and Virtual-label resolution. Iceberg-manifested, columnar, broadly readable by every Lakehouse tool.

The substrate's two formats are not a transition; they are two different roles. ARC1 is what the engine writes; Parquet is what every other tool reads, and what Virtual labels resolve through.

Status — what is shipping, what is still queued#

What the substrate cut delivered as real bytes on disk:

  • The arcflow.worldgraph public module with the six bounded capabilities + the io substrate primitive layer.
  • The full typed schema (NodeLabel, ColumnDef, ColumnType, EdgeLabel).
  • Virtual-label contracts (VirtualLabelEntry, PartitionPattern, ResolverKind) and the CREATE NODE LABEL ... VIRTUAL FROM PARTITION DDL parser.
  • The Python FFI register_virtual_partition(label, partition) and its C ABI counterpart arcflow_register_virtual_partition.
  • The MutationOp enum (row-level and bulk-stripe variants).
  • The WAL writer + replay path — length-prefixed CRC32-IEEE framing, torn-tail tolerance, group-commit fsync.
  • The streaming-stripe writer — append-only ARC1 hot-tier files, capacity-bounded with typed CapacityExceeded refusal.
  • The manifest atomic-commit protocol — write-tmp + fsync + atomic-rename, two-file protocol with F_FULLFSYNC on macOS and fdatasync on Linux.
  • The Memory Governor's admission gate — per-residency-class byte accounting against TierBudget caps; refuses over-commit; tracks shared cap pools.
  • The platform-divergent storage primitives (PlatformOps trait) — macOS / Linux / WSL2 paths with capability dispatch + degraded-atomicity warning at mount where appropriate.
  • The oz:// URI parser, TierBudget + nine-state ResidencyClass, ARC1 file magic + version constants, LSM compaction policy types, block-cache key + handle + policy types, Iceberg-shaped ManifestPayload.

Still queued behind the cut — not described as live on this page:

  • The Memory Governor's heat-score eviction policy. The admission gate is in place; the eviction policy that complements it lands in a follow-on.
  • The planner-side predicate-pushdown rewriter for virtual-label patterns. Until it ships, MATCH (:VirtualLabel ...) returns QueryError::VirtualLabelNotYetQueryable.
  • The ARC1 reader + Parquet decoder bodies. The type vocabulary and writer-side primitives are shipped; the corresponding readers are queued behind the executor wiring.
  • The apply-mutation row-store executor wiring against worldgraph::nodes.
  • The oz:// resolvers + the fsspec Python binding. The parser and the URI vocabulary are public; the resolvers land at K-WAVE-WG-O2..O7.
  • The Iceberg v3 strict reader. The substrate's manifest reader is Iceberg-shaped (field names match v3 conventions), not v3-strict — Avro codec + deletion vectors arrive separately.

When these land, the corresponding user-facing surfaces become discoverable through the SDK + DDL + procedure references.

See also#

  • Architecture — the broader engine-architecture page.
  • World Graph — the conceptual layer this substrate implements.
  • Perception Lake — the sibling immutable-observation layer.
  • Sync Protocol (Deep Dive) — companion deep-dive page on the sync surface.
← PreviousSync Protocol (Deep Dive)Next →TypeScript API