ArcFlow
Company
Managed Services
Markets
  • News
  • LOG IN
  • GET STARTED

OZ brings Visual Intelligence to physical venues, a managed edge layer that lets real-world environments see, understand, and act in real time.

Talk to us

ArcFlow

  • World Models
  • Sensors

Managed Services

  • OZ VI Venue 1
  • Case Studies

Markets

  • Sports
  • Broadcasting
  • Robotics

Company

  • About
  • Technology
  • Careers
  • Contact

Ready to see it live?

Talk to the OZ team about deploying at your venues, from a single pilot match to a full regional rollout.

Schedule a deployment review

© 2026 OZ. All rights reserved.

LinkedIn
ArcFlow Docs
Start
  • Quickstart
  • Installation
  • Bindings
  • Platforms
  • Get Started
  • Cookbook
Concepts
  • World Model
  • Graph Model
  • Evidence Model
  • Observations
  • Confidence & Provenance
  • Proof Artifacts & Gates
  • SQL vs GQL
  • Graph Patterns
  • Parameters
  • Query Results
  • Persistence & WAL
  • Snapshot-Pinned Reads
  • Error Handling
  • Execution Models
  • Causal Edges
  • Adapter Discipline
  • Time Decay
  • Layers
  • 1. World Store
  • 1a. World Store · Smart Reader
  • 2. Perception Lake
  • 3. World Graph
  • 4. Query Engine
  • 5. Live Surface
  • 6. Event Bus
  • 7. Behavior Engine
  • 8. Algorithm Library
  • Virtual Computed Columns
  • Threading Model
  • Typed ID Contract
WorldCypher
  • Overview
  • Execution Options
  • Statements
  • MATCH
  • WHERE
  • RETURN
  • OPTIONAL MATCH
  • CREATE
  • SET
  • MERGE
  • DELETE
  • REMOVE
  • Composition
  • WITH
  • UNION
  • UNWIND
  • CASE
  • Schema
  • Schema Overview
  • Indexes
  • Constraints
  • Functions
  • Built-in Functions
  • Aggregations
  • Procedures
  • Shortest Path
  • EXPLAIN
  • PROFILE
  • Temporal Queriesfacet
  • Spatial Queriesfacet
  • Algorithmsfacet
  • Triggers
Capabilities
  • Live Queries
  • Vector Search
  • Trusted RAG
  • Spatial Knowledge
  • Temporal
  • Behavior Graphs
  • Graph Algorithms
  • Skills
  • CREATE SKILL
  • PROCESS NODE
  • REPROCESS EDGES
  • Sync
  • Programs
  • GPU Acceleration
  • Agent-Native
  • MCP Server
  • Event Sourcing
  • Intent Relay
  • Event Bus
Use Cases
  • Agent Tooling
  • Trusted RAG
  • Knowledge Management
  • Behavior Graphs
  • Autonomous Systems
  • Physical AI
  • Digital Twins
  • Robotics & Perception
  • Sports Analytics
  • Grounded Neural Objects
  • Fraud Detection
Walkthroughs
    Guides
  • Agent Integration
  • Building a World Model
  • Modeling a Social Graph
  • Build a RAG Pipeline
  • Using Skills
  • Behavior Graphs
  • Swarm & Multi-Agent
  • Fleet Coordination
  • Migrate from Cypher / Neo4j
  • From SQL to GQL
  • Filesystem Workspace
  • Data Quality
  • Code Intelligence
  • Scale Patterns
  • v0.7 → v0.8 Lakehouse Fast-Path
  • Tutorials
  • Knowledge Graph
  • Entity Linking
  • Vector Search
  • Graph Algorithms
  • Recipes
  • CRUD
  • Multi-MATCH
  • MERGE (Upsert)
  • Full-Text Search
  • Batch Projection
  • Multi-Source Observation
  • Sports Analytics
Operations
  • CLI
  • REPL Commands
  • Snapshot & Restore
  • Filesystem Projection
  • Plugin Management
  • Agent Governance
  • Server Modes & PG Wire
  • Persistence (ops)
  • Import & Export
  • Deployment
  • Deployment Modes
  • Daemon (UDS)
  • Why not Docker
  • Architecture
  • Engine Architecture
  • Cloud Architecture
  • Sync Protocol (Deep Dive)
  • World Graph Substrate (Preview)
Reference
  • TypeScript API
  • Glossary
  • Naming & Domain Map
  • Data Types
  • Operators
  • Error Codes
  • GQL Reference
  • Known Issues
  • Versioning
  • Licensing
  • Conformance
  • GQL Conformance
  • openCypher TCK
  • Extension Regressions
GQL Reference
    Conformance
  • Conformance Dashboard
  • openCypher TCK Results
  • Extension Regressions
  • Features
  • MATCH Basic
  • CREATE Nodes Edges
  • SET REMOVE Properties
  • DELETE Detach DELETE
  • RETURN WITH WHERE
  • Order BY Limit Skip
  • Order BY Nulls First Last
  • UNWIND
  • Aggregate Functions
  • OPTIONAL MATCH
  • Variable Length Paths
  • Label OR AND NOT Expressions
  • Label Wildcard
  • Quantified Path Sugar
  • Path Modes Walk Trail Simple Acyclic
  • Shortest Path Variants
  • IS Labeled Predicate
  • Element ID Function
  • IS Type Predicate
  • Binary Literals
  • Line Comments Solidus
  • Line Comments Minus
  • GQLSTATUS Result Codes
  • GQL Error Code Mapping
  • Transaction Control Syntax
  • SET Session
  • Conditional Execution WHEN THEN ELSE
  • RETURN NEXT Pipeline
  • Primary Key Constraint
  • Unique Constraint
  • Deterministic MERGE Via PK
  • Undirected Edge MATCH
  • Cast Type Conversion
  • GQL Directories
  • Multiple Labels Per Node
  • GQL Flagger
  • NEXT Linear Composition
  • Cardinality Function
  • INT64 BIGINT Type Names
  • FLOAT64 Double Type Names
  • Log10 Log2 Functions
  • Trim Leading Trailing Both
  • FILTER Clause
  • LET Statement
  • Group BY Explicit
  • EXCEPT SET Operations
  • INTERSECT SET Operations
  • ALL Different Predicate
  • Same Predicate
  • Property Exists Function
  • Path Variable Binding
  • USE Graph Clause
  • FOR IN List
  • Typed Temporal Literals
  • Session SET Value Params
  • Typed List Annotations
  • arcflow.cosine() function
  • arcflow.embed() function
  • arcflow.similar() procedure
  • arcflow.graphrag() procedure
  • ArcFlow Extensions
  • LIVE Queries
  • Triggered Write-Back Views
  • Evidence Algebra
  • Relationship Skills
  • AI Function Namespace
  • Graph Embedding Algorithms
  • ASOF JOIN
  • Durable Workflows
  • Incremental Z-Set Engine
  • GPU GraphBLAS
  • Triggers
  • HNSW Vector Index
  • Extensions Moat

Virtual computed columns

A virtual computed column is a derived property declared on a virtual label. The substrate registers the expression in catalog metadata, the Smart Reader evaluates it at row-decode time against the decoded RecordBatch, and the value surfaces in Node.properties alongside the columns that physically live in the parquet partition.

The row data never grows. The parquet files on disk are unchanged. The derived property exists only as a value flowing through the scan.

Why#

There's a class of queries where the predicate is on a derived quantity, not a column the source partition stores. Operational world models are the canonical case:

For every agent in the fleet on every observation tick, return its position relative to its assigned target at the moment the target becomes active.

position_relative_to_target = agent_position - target_position is a function of two columns the parquet files already store, but the set of useful "relative-to-X" derivations is unbounded — relative to the target, the nearest obstacle, the mission origin, the next waypoint, the centroid of the swarm, the last known peer position. Materialising each one at ingest doubles storage per derivation and forces a schema change every time a new analyst asks for a new relative-frame. Computing per row at query time, naively, defeats predicate pushdown.

Virtual computed columns pick a third option: declare once at the catalog level, evaluate at scan time, push predicates through.

The DDL surface#

CREATE NODE LABEL FrameRelToTarget VIRTUAL FROM PARTITION
  'lake://fleet/telemetry/{mission}/{day}/{shard}'
  COMPUTE
    position_relative_to_target = agent_position - target_position,
    distance_to_target = sqrt(
        (agent_position[0] - target_position[0])^2 +
        (agent_position[1] - target_position[1])^2 +
        (agent_position[2] - target_position[2])^2
    );

The COMPUTE clause sits after the partition pattern. Each entry is a named expression. The names become first-class property keys on the virtual label — indistinguishable from parquet-resident columns at the Cypher surface.

The expression language references:

  • Parquet-resident columns by name (agent_position, target_position).
  • Partition-key variables by name (mission, day, shard). These come through as typed Ints / Strings via the same lossless coercion path that surfaces partition keys on a non-COMPUTE virtual label.
  • Other computed columns declared earlier in the same clause — evaluation is topologically ordered.

Arithmetic, array indexing (position[0]), math functions (sqrt, abs, floor, ceil, pow), and the standard comparison operators are supported. The IR is Arrow-integrated; expressions evaluate column-at-a-time against the decoded RecordBatch.

Querying#

The query surface is exactly the surface of a parquet-resident column:

MATCH (f:FrameRelToTarget)
WHERE f.distance_to_target < 5.0
  AND f.mission = 'survey-NW-quadrant' AND f.day = '2026-03-14'
RETURN f.agent_id, f.distance_to_target
ORDER BY f.distance_to_target
LIMIT 10

The planner is aware that distance_to_target is computed. The predicate f.distance_to_target < 5.0 is pushable when the substrate has enough column statistics on the inputs (agent_position, target_position) to prove a row group can be skipped before evaluating. Partition + row-group pruning collapses the candidate set first; the per-row arithmetic runs only on what survives.

For a 311M-frame quarter-scale query at the top of this page:

~311M rows total
  → partition prune (mission='survey-NW-quadrant', day='2026-03-14') → ~1M rows
  → row-group prune on target_position stats                          → ~25 rows
  → evaluate distance_to_target + filter < 5.0                       → final answer

The total cost is O(25 rows × eval + pruning) instead of O(311M rows × eval).

How it composes#

Computed columns only earn their keep because three already-shipped pieces compose with them:

LayerSubstrateWhat it contributes
PlannerWHERE-pushdown into virtual-label scanspartition + row-group pruning collapses the candidate set before any per-row evaluation
Smart Readerpartition-key column exposureexpressions reference mission / day / frame_idx directly as typed Ints / Strings
IndexHIDX hybrid indexembedding-aware expressions (THIS.embedding · peer.embedding) hit the registered index

Same shape as the virtual-label registration itself: the user declares a typed surface; the engine decides which layer of itself answers which part of the query.

What ships in metadata#

Registration commits a ComputedColumn entry alongside the existing VirtualLabelEntry in the catalog manifest at <workspace>/canonical/manifest_<epoch>.json. Each entry carries the column name, the typed return shape, the dependency list (which parquet-resident columns + partition keys + earlier computed columns it references), and the Arrow-compatible expression IR.

The manifest is the same write-tmp + fsync + atomic_rename two-file protocol that backs the base VirtualLabelEntry commit — atomic, crash-safe, monotonic epoch.

What stays materialized vs computed#

The mechanical rule:

PropertyParquet-residentComputed
Storage costone column per filezero — value flows through the scan
Schema-change costadding a column rewrites the partitionadding a column edits the catalog only
Read patterncolumn-pruned scancolumn-pruned scan over inputs + per-row arithmetic
Mutabilityappend-only by partition rewriteredeclare via ALTER (planned) — no row movement
Useful forvalues present in the sourcederived projections, relative coordinates, embedding-aware distances, learned-function outputs (via downstream NN wave)

The two coexist on the same virtual label. A frame's agent_position is parquet-resident; its distance_to_target is computed; the Cypher query treats both the same way.

Worked example — Python SDK#

from arcflow import ArcFlow
 
db = ArcFlow("./workspace")
 
db.execute("""
    CREATE NODE LABEL FrameRelToTarget VIRTUAL FROM PARTITION
      'lake://fleet/telemetry/{mission}/{day}/{shard}'
      COMPUTE
        position_relative_to_target = agent_position - target_position,
        distance_to_target = sqrt(
            (agent_position[0] - target_position[0]) ^ 2 +
            (agent_position[1] - target_position[1]) ^ 2 +
            (agent_position[2] - target_position[2]) ^ 2
        )
""")
 
# Predicate on a computed column — pushed through to the Smart Reader.
result = db.execute("""
    MATCH (f:FrameRelToTarget)
    WHERE f.mission = 'survey-NW-quadrant' AND f.day = '2026-03-14'
      AND f.distance_to_target < 5.0
    RETURN f.agent_id, f.distance_to_target
    ORDER BY f.distance_to_target
    LIMIT 10
""")
 
for row in result:
    print(row["agent_id"], row["distance_to_target"])

The result rows look indistinguishable from a non-COMPUTE virtual label query. The decoded RecordBatch carries the computed column alongside the parquet-resident ones; the Cypher result mapper doesn't distinguish.

What you give up#

  • No incremental refresh. A computed column is always re-evaluated on read. There's no materialized cache; if you need that, the right shape is a downstream pipeline that emits a parquet column and a non-COMPUTE virtual label over the result.
  • The expression language is a strict subset of Cypher. Functions available inside COMPUTE are the Arrow-evaluable set — arithmetic, math, array indexing, comparison. Graph traversals, path patterns, and per-row Cypher procedures are not callable from inside a COMPUTE expression. They remain callable in the surrounding query.
  • Dependency-cycle declarations are rejected at registration. A topological sort runs over the COMPUTE block; cyclic references surface as a typed registration error.

Pattern stack#

Computed columns are structurally a sibling of two other "derived property without materialization" stories the engine is shipping:

  • PropertyValue::Tensor — tensor-typed properties carrying shaped numerical payloads at the node level. Same operating principle (typed-derived-property surfaces uniformly through the Cypher result mapper); different physical substrate (in-engine bytes vs. parquet-scan-time evaluation).
  • NodeModel → predicted property — a registered learned function emits a property at the right moment. Where computed columns evaluate at scan time against parquet, NodeModel evaluates at observation time against an in-engine tensor.

All three close the same gap differently: keep the typed-property contract at the query surface stable; pick the evaluation moment that makes the workload cheap.

See also#

  • World Store layer — the substrate the COMPUTE expressions run against
  • Virtual labels cookbook — the registration-and-query walkthrough this page builds on
  • Smart Reader — the format-aware reader that evaluates the expressions
  • CREATE NODE LABEL — the full DDL syntax for virtual + computed declarations
← Previous8. Algorithm LibraryNext →Threading Model