ArcFlow
Company
Managed Services
Markets
  • News
  • LOG IN
  • GET STARTED

OZ brings Visual Intelligence to physical venues, a managed edge layer that lets real-world environments see, understand, and act in real time.

Talk to us

ArcFlow

  • World Models
  • Sensors

Managed Services

  • OZ VI Venue 1
  • Case Studies

Markets

  • Sports
  • Broadcasting
  • Robotics

Company

  • About
  • Technology
  • Careers
  • Contact

Ready to see it live?

Talk to the OZ team about deploying at your venues, from a single pilot match to a full regional rollout.

Schedule a deployment review

© 2026 OZ. All rights reserved.

LinkedIn
ArcFlow Docs
Start
  • Quickstart
  • Installation
  • Bindings
  • Platforms
  • Get Started
  • Cookbook
Concepts
  • World Model
  • Graph Model
  • Evidence Model
  • Observations
  • Confidence & Provenance
  • Proof Artifacts & Gates
  • SQL vs GQL
  • Graph Patterns
  • Parameters
  • Query Results
  • Persistence & WAL
  • Snapshot-Pinned Reads
  • Error Handling
  • Execution Models
  • Causal Edges
  • Adapter Discipline
  • Time Decay
  • Layers
  • 1. World Store
  • 1a. World Store · Smart Reader
  • 2. Perception Lake
  • 3. World Graph
  • 4. Query Engine
  • 5. Live Surface
  • 6. Event Bus
  • 7. Behavior Engine
  • 8. Algorithm Library
  • Virtual Computed Columns
  • Threading Model
  • Typed ID Contract
WorldCypher
  • Overview
  • Execution Options
  • Statements
  • MATCH
  • WHERE
  • RETURN
  • OPTIONAL MATCH
  • CREATE
  • SET
  • MERGE
  • DELETE
  • REMOVE
  • Composition
  • WITH
  • UNION
  • UNWIND
  • CASE
  • Schema
  • Schema Overview
  • Indexes
  • Constraints
  • Functions
  • Built-in Functions
  • Aggregations
  • Procedures
  • Shortest Path
  • EXPLAIN
  • PROFILE
  • Temporal Queriesfacet
  • Spatial Queriesfacet
  • Algorithmsfacet
  • Triggers
Capabilities
  • Live Queries
  • Vector Search
  • Trusted RAG
  • Spatial Knowledge
  • Temporal
  • Behavior Graphs
  • Graph Algorithms
  • Skills
  • CREATE SKILL
  • PROCESS NODE
  • REPROCESS EDGES
  • Sync
  • Programs
  • GPU Acceleration
  • Agent-Native
  • MCP Server
  • Event Sourcing
  • Intent Relay
  • Event Bus
Use Cases
  • Agent Tooling
  • Trusted RAG
  • Knowledge Management
  • Behavior Graphs
  • Autonomous Systems
  • Physical AI
  • Digital Twins
  • Robotics & Perception
  • Sports Analytics
  • Grounded Neural Objects
  • Fraud Detection
Walkthroughs
    Guides
  • Agent Integration
  • Building a World Model
  • Modeling a Social Graph
  • Build a RAG Pipeline
  • Using Skills
  • Behavior Graphs
  • Swarm & Multi-Agent
  • Fleet Coordination
  • Migrate from Cypher / Neo4j
  • From SQL to GQL
  • Filesystem Workspace
  • Data Quality
  • Code Intelligence
  • Scale Patterns
  • v0.7 → v0.8 Lakehouse Fast-Path
  • Tutorials
  • Knowledge Graph
  • Entity Linking
  • Vector Search
  • Graph Algorithms
  • Recipes
  • CRUD
  • Multi-MATCH
  • MERGE (Upsert)
  • Full-Text Search
  • Batch Projection
  • Multi-Source Observation
  • Sports Analytics
Operations
  • CLI
  • REPL Commands
  • Snapshot & Restore
  • Filesystem Projection
  • Plugin Management
  • Agent Governance
  • Server Modes & PG Wire
  • Persistence (ops)
  • Import & Export
  • Deployment
  • Deployment Modes
  • Daemon (UDS)
  • Why not Docker
  • Architecture
  • Engine Architecture
  • Cloud Architecture
  • Sync Protocol (Deep Dive)
  • World Graph Substrate (Preview)
Reference
  • TypeScript API
  • Glossary
  • Naming & Domain Map
  • Data Types
  • Operators
  • Error Codes
  • GQL Reference
  • Known Issues
  • Versioning
  • Licensing
  • Conformance
  • GQL Conformance
  • openCypher TCK
  • Extension Regressions
GQL Reference
    Conformance
  • Conformance Dashboard
  • openCypher TCK Results
  • Extension Regressions
  • Features
  • MATCH Basic
  • CREATE Nodes Edges
  • SET REMOVE Properties
  • DELETE Detach DELETE
  • RETURN WITH WHERE
  • Order BY Limit Skip
  • Order BY Nulls First Last
  • UNWIND
  • Aggregate Functions
  • OPTIONAL MATCH
  • Variable Length Paths
  • Label OR AND NOT Expressions
  • Label Wildcard
  • Quantified Path Sugar
  • Path Modes Walk Trail Simple Acyclic
  • Shortest Path Variants
  • IS Labeled Predicate
  • Element ID Function
  • IS Type Predicate
  • Binary Literals
  • Line Comments Solidus
  • Line Comments Minus
  • GQLSTATUS Result Codes
  • GQL Error Code Mapping
  • Transaction Control Syntax
  • SET Session
  • Conditional Execution WHEN THEN ELSE
  • RETURN NEXT Pipeline
  • Primary Key Constraint
  • Unique Constraint
  • Deterministic MERGE Via PK
  • Undirected Edge MATCH
  • Cast Type Conversion
  • GQL Directories
  • Multiple Labels Per Node
  • GQL Flagger
  • NEXT Linear Composition
  • Cardinality Function
  • INT64 BIGINT Type Names
  • FLOAT64 Double Type Names
  • Log10 Log2 Functions
  • Trim Leading Trailing Both
  • FILTER Clause
  • LET Statement
  • Group BY Explicit
  • EXCEPT SET Operations
  • INTERSECT SET Operations
  • ALL Different Predicate
  • Same Predicate
  • Property Exists Function
  • Path Variable Binding
  • USE Graph Clause
  • FOR IN List
  • Typed Temporal Literals
  • Session SET Value Params
  • Typed List Annotations
  • arcflow.cosine() function
  • arcflow.embed() function
  • arcflow.similar() procedure
  • arcflow.graphrag() procedure
  • ArcFlow Extensions
  • LIVE Queries
  • Triggered Write-Back Views
  • Evidence Algebra
  • Relationship Skills
  • AI Function Namespace
  • Graph Embedding Algorithms
  • ASOF JOIN
  • Durable Workflows
  • Incremental Z-Set Engine
  • GPU GraphBLAS
  • Triggers
  • HNSW Vector Index
  • Extensions Moat

Typed ID Contract

ArcFlow property values are strictly typed. The equality test n.nfl_id = 58384 matches only Int(58384) values; String("58384") does not coerce automatically.

One-sentence rule: Pick one physical type per identifier kind (INT or STRING) at ingest; apply it consistently across every source that feeds the graph. The engine never coerces at the equality layer — silent empty results are preferable to silent type promotion.

The trap#

When the same logical identifier (player ID, game key, season, partition key) is stored as different physical types in different data sources, a single Cypher predicate can return different results across handles:

# Source A: parquet Int64 column → PropertyValue::Int(58384)
db_a.execute("MATCH (p:Player {nfl_id: 58384}) RETURN p")    # → match
db_a.execute("MATCH (p:Player {nfl_id: '58384'}) RETURN p")  # → empty
 
# Source B: parquet Utf8 column → PropertyValue::String("58384")
db_b.execute("MATCH (p:Player {nfl_id: 58384}) RETURN p")    # → empty
db_b.execute("MATCH (p:Player {nfl_id: '58384'}) RETURN p")  # → match

Two handles, identical Cypher, opposite answers. The collision is silent — an empty result reads as "no such player" rather than "type mismatch." Production code defending against this writes manual casts everywhere, or maintains a string-vs-int adapter layer. Neither scales.

The fix lives at the ingest boundary.

Supported patterns#

Pick one per identifier kind and apply it consistently across every source that feeds the graph:

PatternWhen to useTradeoff
Int everywhereNative numeric IDs (game_key, play_id, season, integer primary keys)Smaller storage; faster equality; can't carry leading zeros or mixed-case
String everywhereComposite IDs (UUIDs, slugs, mixed-case, hash digests, semver strings)Larger property storage; lexicographic comparison only
Coerce at ingestWhen sources are heterogeneous (parquet Int64 from one feed, Utf8 from another)One transform step at the ingest boundary; predictable downstream

The third pattern is what closes most real-world collisions. The canonical choice per identifier kind is recorded somewhere (pyproject.toml constants, a schema-defining CREATE NODE LABEL, an ops runbook); ingest pipelines coerce on the way in.

Coerce-at-ingest example#

When the parquet upstream gives you Int64 but your schema says nfl_id is STRING:

import pyarrow as pa
import pyarrow.compute as pc
 
raw = pq.read_table("players.parquet")
# Coerce the nfl_id column to string at the ingest boundary.
typed = raw.set_column(
    raw.schema.get_field_index("nfl_id"),
    "nfl_id",
    pc.cast(raw.column("nfl_id"), pa.string()),
)
db.bulk_create_nodes_from_arrow("Player", typed)

Reverse direction (Utf8 → Int64) works the same way with pc.cast(col, pa.int64()). The transform is one row's worth of work per source row, applied once; downstream queries pay no per-row cost.

Query-time workarounds (pay per-row)#

When ingest can't be changed — heterogeneous upstream sources you don't own, retroactive schema choice, a migration window — parameterized union works:

// Match both Int and String forms in one query
MATCH (p:Player)
WHERE p.nfl_id = $id OR p.nfl_id = toString($id)
RETURN p

Or via prepared statements:

stmt = db.prepare("""
    MATCH (p:Player {nfl_id: $id_int}) RETURN p
    UNION
    MATCH (p:Player {nfl_id: $id_str}) RETURN p
""")
result = stmt.execute(id_int=58384, id_str="58384")

Both patterns add row-time work — the planner evaluates the predicate against every node in the label scan. For hot paths, the ingest-side fix is preferred; the workarounds above are for the migration window where ingest can't change fast enough.

Why the engine doesn't auto-coerce#

Three reasons the substrate keeps the strict-typing rule:

  1. Predictability. Int(58384) = String("58384") is structurally false in most type systems (Rust, TypeScript, Python with strict-typing on). Coercing at equality would make ArcFlow inconsistent with the typed-value model that powers vector search, range queries, and the rest of the predicate library.

  2. Performance. Auto-coerce at filter eval would force per-row type-checking + conversion on every equality predicate, regressing the hot path for the common case where both sides match. The hot path stays tight; the migration burden stays at the ingest boundary where the data is already being shaped.

  3. Silent wrongness. Coercion at the equality layer can mask genuine type mismatches — Int(2024) accidentally compared to String("season_2024") would coerce one way (extract digits?) or the other (stringify?), with neither being clearly right. Surfacing the mismatch as an empty result lets the caller fix the upstream collision rather than carrying the wrong assumption forward.

A future engine extension might add an opt-in EQUALS_COERCE mode for primary-key columns under a deliberate per-label policy. Until then, ingest discipline + the workarounds above cover the case. The default stays strict; opting into coercion stays explicit.

Diagnostic — finding the collision in your graph#

When a query returns empty unexpectedly, check the actual property type for the rows you expected to match:

MATCH (p:Player)
RETURN p.nfl_id, apoc.meta.type(p.nfl_id) AS type
LIMIT 5

If the type column shows STRING for some rows and INTEGER for others, you've found the collision. From there:

  1. Choose the canonical type for that identifier kind (INT for native numeric IDs; STRING for composite or leading-zero IDs).
  2. Coerce the offending rows — either re-ingest after fixing the upstream pipeline, or run a one-time SET p.nfl_id = toString(p.nfl_id) (or toInteger(...)) update across the affected partition.
  3. Document the choice somewhere durable — pyproject constants, a schema doc, a CREATE NODE LABEL annotation. Future ingest sources reference the canonical type.

See also#

  • Properties and types — the typed-value model that powers the strict-equality rule.
  • Bulk ingest patterns — coerce-at-ingest worked examples.
  • Graph Model — how nodes carry property values + the typed-value catalog.
  • Threading Model — sibling contract page; describes ArcFlow's concurrency invariants.
← PreviousThreading ModelNext →Overview