ArcFlow
Company
Managed Services
Markets
  • News
  • LOG IN
  • GET STARTED

OZ brings Visual Intelligence to physical venues, a managed edge layer that lets real-world environments see, understand, and act in real time.

Talk to us

ArcFlow

  • World Models
  • Sensors

Managed Services

  • OZ VI Venue 1
  • Case Studies

Markets

  • Sports
  • Broadcasting
  • Robotics

Company

  • About
  • Technology
  • Careers
  • Contact

Ready to see it live?

Talk to the OZ team about deploying at your venues, from a single pilot match to a full regional rollout.

Schedule a deployment review

© 2026 OZ. All rights reserved.

LinkedIn
ArcFlow Docs
Start
  • Quickstart
  • Installation
  • Bindings
  • Platforms
  • Get Started
Concepts
  • World Model
  • Graph Model
  • Evidence Model
  • Observations
  • Confidence & Provenance
  • Proof Artifacts & Gates
  • SQL vs GQL
  • Graph Patterns
  • Parameters
  • Query Results
  • Persistence & WAL
  • Snapshot-Pinned Reads
  • Error Handling
  • Execution Models
  • Causal Edges
  • Adapter Discipline
  • Time Decay
  • Layers
  • 1. Perception Lake
  • 2. World Graph
  • 3. Query Engine
  • 4. Live Surface
  • 5. Event Bus
  • 6. Behavior Engine
  • 7. Algorithm Library
WorldCypher
  • Overview
  • Statements
  • MATCH
  • WHERE
  • RETURN
  • OPTIONAL MATCH
  • CREATE
  • SET
  • MERGE
  • DELETE
  • REMOVE
  • Composition
  • WITH
  • UNION
  • UNWIND
  • CASE
  • Schema
  • Schema Overview
  • Indexes
  • Constraints
  • Functions
  • Built-in Functions
  • Aggregations
  • Procedures
  • Shortest Path
  • EXPLAIN
  • PROFILE
  • Temporal Queriesfacet
  • Spatial Queriesfacet
  • Algorithmsfacet
  • Triggers
Capabilities
  • Live Queries
  • Vector Search
  • Trusted RAG
  • Spatial Knowledge
  • Temporal
  • Behavior Graphs
  • Graph Algorithms
  • Skills
  • CREATE SKILL
  • PROCESS NODE
  • REPROCESS EDGES
  • Sync
  • Programs
  • GPU Acceleration
  • Agent-Native
  • MCP Server
  • Event Sourcing
  • Intent Relay
  • Event Bus
Use Cases
  • Agent Tooling
  • Trusted RAG
  • Knowledge Management
  • Behavior Graphs
  • Autonomous Systems
  • Physical AI
  • Digital Twins
  • Robotics & Perception
  • Sports Analytics
  • Grounded Neural Objects
  • Fraud Detection
Walkthroughs
    Guides
  • Agent Integration
  • Building a World Model
  • Modeling a Social Graph
  • Build a RAG Pipeline
  • Using Skills
  • Behavior Graphs
  • Swarm & Multi-Agent
  • Fleet Coordination
  • Migrate from Cypher / Neo4j
  • From SQL to GQL
  • Filesystem Workspace
  • Data Quality
  • Code Intelligence
  • Scale Patterns
  • v0.7 → v0.8 Lakehouse Fast-Path
  • Tutorials
  • Knowledge Graph
  • Entity Linking
  • Vector Search
  • Graph Algorithms
  • Recipes
  • CRUD
  • Multi-MATCH
  • MERGE (Upsert)
  • Full-Text Search
  • Batch Projection
  • Multi-Source Observation
  • Sports Analytics
Operations
  • CLI
  • REPL Commands
  • Snapshot & Restore
  • Filesystem Projection
  • Server Modes & PG Wire
  • Persistence (ops)
  • Import & Export
  • Deployment
  • Daemon (UDS)
  • Architecture
  • Engine Architecture
  • Cloud Architecture
  • Sync Protocol (Deep Dive)
  • World Graph Substrate (Preview)
Reference
  • TypeScript API
  • Glossary
  • Naming & Domain Map
  • Data Types
  • Operators
  • Error Codes
  • GQL Reference
  • Known Issues
  • Versioning
  • Licensing
  • Conformance
  • GQL Conformance
  • openCypher TCK
  • Extension Regressions
GQL Conformance
  • Conformance Dashboard
  • openCypher TCK Results
  • Extension Regressions
GQL Features
  • MATCH Basic
  • CREATE Nodes Edges
  • SET REMOVE Properties
  • DELETE Detach DELETE
  • RETURN WITH WHERE
  • Order BY Limit Skip
  • Order BY Nulls First Last
  • UNWIND
  • Aggregate Functions
  • OPTIONAL MATCH
  • Variable Length Paths
  • Label OR AND NOT Expressions
  • Label Wildcard
  • Quantified Path Sugar
  • Path Modes Walk Trail Simple Acyclic
  • Shortest Path Variants
  • IS Labeled Predicate
  • Element ID Function
  • IS Type Predicate
  • Binary Literals
  • Line Comments Solidus
  • Line Comments Minus
  • GQLSTATUS Result Codes
  • GQL Error Code Mapping
  • Transaction Control Syntax
  • SET Session
  • Conditional Execution WHEN THEN ELSE
  • RETURN NEXT Pipeline
  • Primary Key Constraint
  • Unique Constraint
  • Deterministic MERGE Via PK
  • Undirected Edge MATCH
  • Cast Type Conversion
  • GQL Directories
  • Multiple Labels Per Node
  • GQL Flagger
  • NEXT Linear Composition
  • Cardinality Function
  • INT64 BIGINT Type Names
  • FLOAT64 Double Type Names
  • Log10 Log2 Functions
  • Trim Leading Trailing Both
  • FILTER Clause
  • LET Statement
  • Group BY Explicit
  • EXCEPT SET Operations
  • INTERSECT SET Operations
  • ALL Different Predicate
  • Same Predicate
  • Property Exists Function
  • Path Variable Binding
  • USE Graph Clause
  • FOR IN List
  • Typed Temporal Literals
  • Session SET Value Params
  • Typed List Annotations
  • arcflow.cosine() function
  • arcflow.embed() function
  • arcflow.similar() procedure
  • arcflow.graphrag() procedure
ArcFlow Extensions
  • LIVE Queries
  • Triggered Write-Back Views
  • Evidence Algebra
  • Relationship Skills
  • AI Function Namespace
  • Graph Embedding Algorithms
  • ASOF JOIN
  • Durable Workflows
  • Incremental Z-Set Engine
  • GPU GraphBLAS
  • Triggers
  • HNSW Vector Index
  • Extensions Moat

v0.7 → v0.8 Lakehouse Fast-Path

The 0.8.0 cut introduces the Lakehouse fast-path — a way to ingest high-cardinality immutable rows (frames, telemetry samples, event-stream rows) without materialising them into engine RAM. This guide is for v0.7.x consumers who hit the substrate cliff and want to migrate.

You do not need to migrate if:

  • Every class you ingest is mutable (charting, entity-resolution merges, derived state).
  • Every class you ingest is low-cardinality (≤ ~50K rows total — players, plays, devices, agents).
  • Your bulk_create_* calls fit comfortably in memory.

The legacy bulk_create_* ingest path stays. The crate-root modules (mvcc, dense_store, column_store, csr) remain as canonical re-exports of their worldgraph::* counterparts; v0.7.x-pinned consumers continue to work without code change.

You should migrate if:

  • You ingest immutable observation rows — anything that arrived once and never changes.
  • The class is high-cardinality — frames per game, samples per device, events per stream.
  • The v0.7 ingest left engine RAM growing faster than disk should justify.

The fast-path is what the substrate rewrite was opened to enable.

Step 1 — Classify your node classes#

For each class in your schema, apply the mechanical decision rule (see World Graph for the full R1–R3 boundary):

  1. Is the class mutable? Yes → keep as Owned (bulk_create_*).
  2. Is the class an immutable observation row? Yes → migrate to Virtual.
  3. Edges are always Owned regardless of endpoint classification.

A worked example from a sports-tracking workload:

ClassCardinalityMutabilityRead patternTodayAfter migration
Player~95 / seasonmutable (roster, injury)property + traversalbulk_create_nodesunchanged
Play~176 / gamemutable (charting)property + traversalbulk_create_nodesunchanged
Chartingper sourcemutableproperty + traversalbulk_create_nodesunchanged
Frame~1M / gameimmutablecolumnar predicate scanbulk_create_nodesVIRTUAL FROM PARTITION
Telemetry~1M / gameimmutablecolumnar predicate scanbulk_create_nodesVIRTUAL FROM PARTITION
TRACKED (edge)high-cardappend-onlyCSR traversalbulk_create_relationshipsunchanged — edges are always Owned

If a class produces an ambiguous classification — mutable AND high-cardinality AND read-by-traversal, or immutable AND low-cardinality AND written-multiple-times — the R1–R3 rules cannot resolve it in isolation. Treat that as a stop condition: surface the class for review, pick one axis as the dominant, and document the tradeoff.

Step 2 — Author the lake:// mount config#

The substrate addresses Lake partitions through the lake:// URI scheme. A registration takes the form:

lake://<mount>/<table>/{var}=<glob>[/{var}=<glob>]…/<file-glob>.parquet

For the worked-example schema above:

ClassPartition pattern
Framelake://nfl/tracks/{season}/{week}/{game_key}.parquet
Telemetrylake://sensors/temperature/{year}/{month}/{day}/{sensor_id}.parquet

The mount (nfl, sensors) is configured at workspace open time and binds the URI's authority to a backing storage location (a local directory, an S3 bucket, a GCS bucket, an Iceberg catalog endpoint). Template variables in braces are recognised as Hive-partitioned columns and used by the engine for partition pruning at query time.

The full Virtual Labels Over Parquet cookbook walks through a runnable example end-to-end.

Step 3 — Register the virtual label#

Two paths, same effect.

Via DDL#

CREATE NODE LABEL Frame (
  entity_id STRING,
  ts        TIMESTAMP,
  x         DOUBLE,
  y         DOUBLE,
  speed     DOUBLE
) VIRTUAL FROM PARTITION 'lake://nfl/tracks/{season}/{week}/{game_key}.parquet';

The DDL parser validates the typed schema against the Parquet files' schema. A VirtualLabelEntry { label, partition_pattern, schema_ref, resolver_kind } row is committed to the catalog manifest at <workspace>/canonical/manifest_<epoch>.json. The manifest commit is atomic (write-tmp + fsync + atomic_rename with two-file protocol; F_FULLFSYNC on macOS, fdatasync on Linux).

Via Python FFI#

from arcflow import ArcFlow
 
db = ArcFlow("/path/to/workspace")
epoch = db.register_virtual_partition(
    label="Frame",
    partition="lake://nfl/tracks/{season}/{week}/{game_key}.parquet",
)

The C ABI counterpart is arcflow_register_virtual_partition(session, label, partition) -> i64.

Step 4 — Stop ingesting Virtual classes through bulk_create_*#

Once a class is registered as Virtual, its rows live in the Lakehouse partitions. The bulk-ingest path no longer applies to that class. New observation rows arrive as new partitions in the Lake; the manifest version advances; the graph picks up the new partition on its next manifest read.

If your existing pipeline still calls bulk_create_nodes against a class you've moved to Virtual, the path becomes a no-op classification error at the schema layer — exactly the wrong thing was attempted. Remove the bulk_create_* calls; replace them with whatever writes the Parquet files.

Step 5 — Verify the workspace is on the fast-path#

Two checks confirm the migration landed.

Catalog inspection — list every virtual label registered against the workspace:

CALL db.constraints() YIELD name, kind, target
WHERE kind = 'VIRTUAL_LABEL'
RETURN name, target;

Each row is a label/partition-pattern pair. The target is the lake:// URI.

Manifest reading — every committed epoch's manifest survives on disk:

ls <workspace>/canonical/manifest_*.json
cat <workspace>/canonical/manifest_$(cat <workspace>/canonical/CURRENT).json | jq '.virtual_labels'

The virtual_labels array enumerates every Virtual class with its partition pattern and resolver kind. Atomic-commit guarantees the manifest is never half-written; the CURRENT pointer is the two-rename target.

Step 6 — Query against virtual labels#

The intent at the query surface is that virtual labels are indistinguishable from Owned labels:

MATCH (f:Frame {entity_id: 'Unit-01'})
WHERE f.ts >= datetime('2026-03-14T08:00:00')
  AND f.ts <  datetime('2026-03-14T09:00:00')
RETURN f.ts, f.x, f.y, f.speed
ORDER BY f.ts;

The planner-side rewriter for MATCH (:VirtualLabel ...) patterns — which decomposes the pattern into a manifest-pruned, predicate-pushed Parquet scan — is the next wave of substrate work. Until it lands, queries against virtual labels return a typed QueryError::VirtualLabelNotYetQueryable. The registration path described above is real bytes on disk now; the read path is wired but gated.

Plan for the rollout accordingly: the migration produces correct catalog state today, and the queries that depend on it light up when the rewriter ships. Until then, downstream reads against Virtual rows go through the Parquet files directly (the partitions remain Lakehouse-shaped; any Arrow / Parquet / Iceberg tool can read them in parallel with the migration).

A note on overlay tables — correcting a Virtual row#

Virtual rows are immutable by contract. Corrections happen via overlay tables: an Owned class that the Query Engine joins at read time. Pattern:

-- The Virtual class
CREATE NODE LABEL Frame (...) VIRTUAL FROM PARTITION '...';
 
-- The Owned overlay class — small, mutable
CREATE NODE LABEL FrameCorrection (
  frame_id    STRING,    -- the entity_id of the Frame being corrected
  field_name  STRING,
  new_value   ANY,
  authored_at TIMESTAMP,
  authored_by STRING
);

A reader query joins both: take the Frame row from the Parquet partition; if a FrameCorrection exists for that frame_id + field_name, the overlay wins. This is the discipline :CAUSED_BY edges layer on top of — see Causal Edges for the full pattern.

Worked example — project-merlin#

The project-merlin NFL stress harness ran this exact migration as the v0.8.0 day-zero plan: 22 entities × 5 plays × 1M frames per game × hundreds of games. The transition document at project-merlin/SHIP-v0.8-TRANSITION.md (in the project-merlin repo) is a consumer-side worked example referenceable verbatim. It covers:

  • The before-and-after schema (Frame moved from Owned to Virtual; Player + Play stayed Owned).
  • The partition layout authored against the merlin-nfl-2025/canonical/ Iceberg-shaped tree.
  • The mount config + registration sequence.
  • The verification steps that confirmed the catalog scan opens in well under 100 ms over 280+ partitions.

If you are migrating a similar shape — high-cardinality observation rows on the side of a small mutable entity model — that document is the closest worked precedent available.

See also#

  • Virtual Labels Over Parquet — the runnable cookbook recipe.
  • World Graph — the conceptual layer and R1–R3 boundary.
  • Perception Lake — the sibling immutable-observation layer.
  • World Graph Substrate — the engine-architecture deep-dive.
  • Causal Edges — the discipline for overlay-table corrections.
  • CHANGELOG — the v0.8.0 release notes.
← PreviousScale PatternsNext →Knowledge Graph