ArcFlow
Company
Managed Services
Markets
  • News
  • LOG IN
  • GET STARTED

OZ brings Visual Intelligence to physical venues, a managed edge layer that lets real-world environments see, understand, and act in real time.

Talk to us

ArcFlow

  • World Models
  • Sensors

Managed Services

  • OZ VI Venue 1
  • Case Studies

Markets

  • Sports
  • Broadcasting
  • Robotics

Company

  • About
  • Technology
  • Careers
  • Contact

Ready to see it live?

Talk to the OZ team about deploying at your venues, from a single pilot match to a full regional rollout.

Schedule a deployment review

© 2026 OZ. All rights reserved.

LinkedIn
ArcFlow Docs
Start
  • Quickstart
  • Installation
  • Bindings
  • Platforms
  • Get Started
  • Cookbook
Concepts
  • World Model
  • Graph Model
  • Evidence Model
  • Observations
  • Confidence & Provenance
  • Proof Artifacts & Gates
  • SQL vs GQL
  • Graph Patterns
  • Parameters
  • Query Results
  • Persistence & WAL
  • Snapshot-Pinned Reads
  • Error Handling
  • Execution Models
  • Causal Edges
  • Adapter Discipline
  • Time Decay
  • Layers
  • 1. World Store
  • 1a. World Store · Smart Reader
  • 2. Perception Lake
  • 3. World Graph
  • 4. Query Engine
  • 5. Live Surface
  • 6. Event Bus
  • 7. Behavior Engine
  • 8. Algorithm Library
  • Virtual Computed Columns
  • Threading Model
  • Typed ID Contract
WorldCypher
  • Overview
  • Execution Options
  • Statements
  • MATCH
  • WHERE
  • RETURN
  • OPTIONAL MATCH
  • CREATE
  • SET
  • MERGE
  • DELETE
  • REMOVE
  • Composition
  • WITH
  • UNION
  • UNWIND
  • CASE
  • Schema
  • Schema Overview
  • Indexes
  • Constraints
  • Functions
  • Built-in Functions
  • Aggregations
  • Procedures
  • Shortest Path
  • EXPLAIN
  • PROFILE
  • Temporal Queriesfacet
  • Spatial Queriesfacet
  • Algorithmsfacet
  • Triggers
Capabilities
  • Live Queries
  • Vector Search
  • Trusted RAG
  • Spatial Knowledge
  • Temporal
  • Behavior Graphs
  • Graph Algorithms
  • Skills
  • CREATE SKILL
  • PROCESS NODE
  • REPROCESS EDGES
  • Sync
  • Programs
  • GPU Acceleration
  • Agent-Native
  • MCP Server
  • Event Sourcing
  • Intent Relay
  • Event Bus
Use Cases
  • Agent Tooling
  • Trusted RAG
  • Knowledge Management
  • Behavior Graphs
  • Autonomous Systems
  • Physical AI
  • Digital Twins
  • Robotics & Perception
  • Sports Analytics
  • Grounded Neural Objects
  • Fraud Detection
Walkthroughs
    Guides
  • Agent Integration
  • Building a World Model
  • Modeling a Social Graph
  • Build a RAG Pipeline
  • Using Skills
  • Behavior Graphs
  • Swarm & Multi-Agent
  • Fleet Coordination
  • Migrate from Cypher / Neo4j
  • From SQL to GQL
  • Filesystem Workspace
  • Data Quality
  • Code Intelligence
  • Scale Patterns
  • v0.7 → v0.8 Lakehouse Fast-Path
  • Tutorials
  • Knowledge Graph
  • Entity Linking
  • Vector Search
  • Graph Algorithms
  • Recipes
  • CRUD
  • Multi-MATCH
  • MERGE (Upsert)
  • Full-Text Search
  • Batch Projection
  • Multi-Source Observation
  • Sports Analytics
Operations
  • CLI
  • REPL Commands
  • Snapshot & Restore
  • Filesystem Projection
  • Plugin Management
  • Agent Governance
  • Server Modes & PG Wire
  • Persistence (ops)
  • Import & Export
  • Deployment
  • Deployment Modes
  • Daemon (UDS)
  • Why not Docker
  • Architecture
  • Engine Architecture
  • Cloud Architecture
  • Sync Protocol (Deep Dive)
  • World Graph Substrate (Preview)
Reference
  • TypeScript API
  • Glossary
  • Naming & Domain Map
  • Data Types
  • Operators
  • Error Codes
  • GQL Reference
  • Known Issues
  • Versioning
  • Licensing
  • Conformance
  • GQL Conformance
  • openCypher TCK
  • Extension Regressions
GQL Reference
    Conformance
  • Conformance Dashboard
  • openCypher TCK Results
  • Extension Regressions
  • Features
  • MATCH Basic
  • CREATE Nodes Edges
  • SET REMOVE Properties
  • DELETE Detach DELETE
  • RETURN WITH WHERE
  • Order BY Limit Skip
  • Order BY Nulls First Last
  • UNWIND
  • Aggregate Functions
  • OPTIONAL MATCH
  • Variable Length Paths
  • Label OR AND NOT Expressions
  • Label Wildcard
  • Quantified Path Sugar
  • Path Modes Walk Trail Simple Acyclic
  • Shortest Path Variants
  • IS Labeled Predicate
  • Element ID Function
  • IS Type Predicate
  • Binary Literals
  • Line Comments Solidus
  • Line Comments Minus
  • GQLSTATUS Result Codes
  • GQL Error Code Mapping
  • Transaction Control Syntax
  • SET Session
  • Conditional Execution WHEN THEN ELSE
  • RETURN NEXT Pipeline
  • Primary Key Constraint
  • Unique Constraint
  • Deterministic MERGE Via PK
  • Undirected Edge MATCH
  • Cast Type Conversion
  • GQL Directories
  • Multiple Labels Per Node
  • GQL Flagger
  • NEXT Linear Composition
  • Cardinality Function
  • INT64 BIGINT Type Names
  • FLOAT64 Double Type Names
  • Log10 Log2 Functions
  • Trim Leading Trailing Both
  • FILTER Clause
  • LET Statement
  • Group BY Explicit
  • EXCEPT SET Operations
  • INTERSECT SET Operations
  • ALL Different Predicate
  • Same Predicate
  • Property Exists Function
  • Path Variable Binding
  • USE Graph Clause
  • FOR IN List
  • Typed Temporal Literals
  • Session SET Value Params
  • Typed List Annotations
  • arcflow.cosine() function
  • arcflow.embed() function
  • arcflow.similar() procedure
  • arcflow.graphrag() procedure
  • ArcFlow Extensions
  • LIVE Queries
  • Triggered Write-Back Views
  • Evidence Algebra
  • Relationship Skills
  • AI Function Namespace
  • Graph Embedding Algorithms
  • ASOF JOIN
  • Durable Workflows
  • Incremental Z-Set Engine
  • GPU GraphBLAS
  • Triggers
  • HNSW Vector Index
  • Extensions Moat

World Store

The first of ArcFlow's eight layers. The storage substrate the engine sits on — every byte that survives an engine restart, every manifest that pins a snapshot, every WAL segment that makes a write durable.

The World Store is internal infrastructure with a brand-clean name. It is not a product, a sellable SKU, or the hero of ArcFlow's pitch — ArcFlow itself is the hero. The Store is named at all so the engine's module tree is navigable, and so the layer doctrine has a place to anchor durability, residency, and content-addressing concerns. Everything customer-facing about ArcFlow ultimately exists because the Store quietly does its job.

The substrate is a generic, content-addressed durable store: parquet column files, Iceberg-shaped manifests, WAL segments, version pointers, snapshots, segment containers. It knows nothing about Node, Edge, mission tiers, or schema-typed entities — that vocabulary lives one layer up in the World Graph, which is where the engine's typed identity actually lives.

What lives here#

In the StoreNot in the Store
Iceberg-shaped table + partition manifestsNode / edge identity
WAL segments + group-commit recordsMutable typed entity state
Parquet column files (mmap-backed)Adjacency / CSR topology
Snapshot version pointersIndexes built from typed columns
Content-addressed object blocksMission tier (observed / inferred / predicted)
Atomic manifest-commit transactionsHybrid Logical Clocks
Per-partition free-form provides: codec tagsStanding queries, Z-set deltas

If a value carries typed entity semantics — anything that says "this is a Player, that is a Frame, here is the edge between them" — it lives in the World Graph. If a value is pure bytes plus a tabular schema — column files, manifest entries, WAL records — it lives in the World Store.

The lake:// URI scheme#

The World Store is addressed by lake:// URIs. The scheme is the canonical substrate-layer namespace:

lake://<bucket>/<path/template/{variable}>

Worked example — a virtual partition registration:

db.register_virtual_partition(
    label="Frame",
    partition="lake://nfl/tracks/{season}/{week}",
)

The lake:// URI is the substrate handle. The catalog binds it to a typed VirtualLabelEntry so that MATCH (f:Frame) RETURN count(f) resolves through the Query Engine, the World Graph catalog, and finally the World Store scan.

A lake:// resolver maps to the underlying physical scheme — file:// for local development, s3:// / gs:// for cloud deployments — based on the partition's residency class. Substrate-internal indirection is the point: the engine never sees a raw cloud URI, and consumers never have to encode cloud topology in their queries.

Lakehouse capability — what the Store gives you#

The World Store is Iceberg-shaped, parquet-resident, and queryable as a graph without ever materialising the rows. Three properties make the lakehouse story load-bearing:

Zero-copy virtual labels. A lake:// partition pattern + a CREATE NODE LABEL <Label> VIRTUAL FROM PARTITION '<pattern>' (or the equivalent register_virtual_partition() SDK call) binds a Lakehouse partition to a graph node class. From then on, MATCH (f:Frame) RETURN count(f) resolves to a parquet footer scan (sub-millisecond on partitions of any size). MATCH (f:Frame) WHERE f.x > 50 RETURN f resolves to a column-pruned scan that reads only the x column chunks for row groups whose statistics overlap the predicate. Row data never enters the engine's RAM; the columns the query needs are streamed from the parquet files at disk bandwidth.

Iceberg-compatible catalog reader. Any catalog that emits Iceberg-shaped manifests works — Polaris, Unity, AWS Glue, or a plain manifest file on local disk are all readable. The substrate's manifest reader doesn't care which catalog produced the metadata; it cares that the layout conforms.

Composes with the typed entity layer. A query that touches a virtual-label class and an in-engine class compiles to a mixed-execution plan: the planner reads the catalog, decides which part of the pattern is a Lakehouse scan and which is an in-memory graph probe, runs both, joins the results. The agent writes one Cypher pattern; the engine picks the right execution shape per node class.

Computed columns — derived properties, no materialization. A virtual label can declare derived properties in catalog metadata via a COMPUTE clause on its DDL. The Smart Reader evaluates the expressions at row-decode time against the decoded RecordBatch; the values surface in Node.properties alongside parquet-resident columns; predicates on them push down through the planner. The canonical case is a relative-frame projection on operational telemetry — distance_to_target = sqrt((agent_position[0]-target_position[0])^2 + …) declared once, queried as WHERE f.distance_to_target < 5.0, never written to disk. See Virtual computed columns.

Worked example — register, count, scan:

import arcflow, os
os.environ["OZ_LAKE_ROOT"] = "/path/to/lake/root"
 
db = arcflow.ArcFlow("./workspace")
 
# Bind a Lakehouse partition to a graph node class.
db.register_virtual_partition(
    label="Frame",
    partition="lake://nfl/tracks/{season}/{week}",
)
 
# count(*) → parquet footer scan, zero column reads
db.execute("MATCH (f:Frame) RETURN count(f) AS n")
# {'n': 311000000}  on the canonical NFL tracking dataset
 
# Predicate-pushed scan → only the season + week + x columns are read
db.execute(
    "MATCH (f:Frame) "
    "WHERE f.season = 2024 AND f.week = 12 AND f.x > 50.0 "
    "RETURN count(f)"
)
 
# Composed with in-engine entities — one query, two storage shapes
db.execute("""
  MATCH (p:Player {team: 'Alpha'})-[:OBSERVED_IN]->(f:Frame)
  WHERE f.season = 2024 AND f.x > 50.0
  RETURN p.name, count(f) AS observations
""")

The format-aware reader that plans these scans lives at Smart Reader (world-store/serve) — serve::reader::parquet for parquet today; serve::reader::safetensors for tensor archives; serve::reader::* extends as new column-typed formats land.

Why this layer is separate#

The World Store and the World Graph have fundamentally different operating characteristics:

World StoreWorld Graph
DurabilityObject-store economics, regional replication, lifecycle policiesIn-memory typed view, rebuilt on engine start
SLA bound byDisk + network bandwidthQuery latency
LifecycleOutlives the engine processPer-engine-instance
Coupling to schemaNone — generic bytes + tabular schemasFull — Node, Edge, mission tiers, HLC
What the engine uses it forDurability, residency tiering, replicationIdentity, topology, mission-tier reasoning, query compute

Splitting the substrate from the typed entity layer is a module-boundary decision, not a product decision. It lets the engine's storage concerns evolve on their own SLA without dragging the typed entity layer into every fsync, manifest commit, or compaction policy change. From the consumer's perspective, it's all one engine — ArcFlow.

The boundary contract#

The World Store and World Graph coordinate through a single mechanical rule:

  • The Graph is a view over the Store. A Node in the World Graph corresponds to one or more rows in a partition in the World Store. The mapping is the catalog. The Graph is rebuilt from the Store on engine start; the Store never references the Graph.

The substrate boundary is a module boundary, not a process boundary — ArcFlow remains the one in-process engine — but the architectural separation is real, and lake:// is its visible expression. The same boundary lets the engine swap residency tiers, change replication policy, and evolve its durability story without disturbing the typed entity layer or any code that consumes it.

Why this matters for agents#

An agent that needs to ship a heavy analytical scan — "every detection in zone 4 between 08:00 and 09:00 of the 2024 season" — runs through the engine, which plans the scan against the Store directly. The catalog resolves lake:// to concrete parquet files; the result is bounded by disk bandwidth, not graph traversal. The same agent can pivot to a typed-entity question — "the player who recorded those detections" — and the catalog binds the Store-resident rows to Graph-resident identity without re-materialising the bytes.

The agent writes one Cypher pattern. The engine decides which layer of itself answers which part of it.

Partition-key column exposure#

Hive-style partition keys in lake:// URIs are exposed as plain typed properties on every virtual-label node. The path layout is the schema for those columns:

lake://prod/trades/year=2026/region=eu/file.parquet
                   └──┬──┘   └──┬───┘
                      │         │
                      │         └─→  Trade.region (String)
                      └───────────→  Trade.year   (Int)
CREATE NODE LABEL Trade VIRTUAL FROM PARTITION 'lake://prod/trades/'
 
MATCH (t:Trade)
WHERE t.year = 2026 AND t.region = 'eu'
RETURN count(*)

The planner translates the WHERE clause into a directory predicate and intersects it with the manifest before opening any file. Other partitions are pruned at the directory walk — never read, never decoded. Inside the partitions that survive, parquet row-group statistics drive a second pruning pass. Both passes report through result.io_stats.partitions_pruned and result.io_stats.row_groups_pruned.

The Store does not require an explicit PARTITION KEY declaration; the engine infers partition columns from the path on first scan and records them in the catalog. Subsequent DDL operations against the same virtual label reuse the inferred schema.

See also#

  • World Graph — the typed entity layer that sits on top of the Store.
  • Perception Lake — the append-only observation landing zone (a reserved layer between Store and Graph for sensor-grade ingest).
  • Persistence & WAL — how Store writes become durable.
  • Snapshots — how a reader sees a consistent point-in-time view across the Store.
← PreviousTime DecayNext →1a. World Store · Smart Reader