ArcFlow
Company
Managed Services
Markets
  • News
  • LOG IN
  • GET STARTED

OZ brings Visual Intelligence to physical venues, a managed edge layer that lets real-world environments see, understand, and act in real time.

Talk to us

ArcFlow

  • World Models
  • Sensors

Managed Services

  • OZ VI Venue 1
  • Case Studies

Markets

  • Sports
  • Broadcasting
  • Robotics

Company

  • About
  • Technology
  • Careers
  • Contact

Ready to see it live?

Talk to the OZ team about deploying at your venues, from a single pilot match to a full regional rollout.

Schedule a deployment review

© 2026 OZ. All rights reserved.

LinkedIn
ArcFlow Docs
Start
  • Quickstart
  • Installation
  • Bindings
  • Platforms
  • Get Started
  • Cookbook
Concepts
  • World Model
  • Graph Model
  • Evidence Model
  • Observations
  • Confidence & Provenance
  • Proof Artifacts & Gates
  • SQL vs GQL
  • Graph Patterns
  • Parameters
  • Query Results
  • Persistence & WAL
  • Snapshot-Pinned Reads
  • Error Handling
  • Execution Models
  • Causal Edges
  • Adapter Discipline
  • Time Decay
  • Layers
  • 1. World Store
  • 1a. World Store · Smart Reader
  • 2. Perception Lake
  • 3. World Graph
  • 4. Query Engine
  • 5. Live Surface
  • 6. Event Bus
  • 7. Behavior Engine
  • 8. Algorithm Library
  • Virtual Computed Columns
  • Threading Model
  • Typed ID Contract
WorldCypher
  • Overview
  • Execution Options
  • Statements
  • MATCH
  • WHERE
  • RETURN
  • OPTIONAL MATCH
  • CREATE
  • SET
  • MERGE
  • DELETE
  • REMOVE
  • Composition
  • WITH
  • UNION
  • UNWIND
  • CASE
  • Schema
  • Schema Overview
  • Indexes
  • Constraints
  • Functions
  • Built-in Functions
  • Aggregations
  • Procedures
  • Shortest Path
  • EXPLAIN
  • PROFILE
  • Temporal Queriesfacet
  • Spatial Queriesfacet
  • Algorithmsfacet
  • Triggers
Capabilities
  • Live Queries
  • Vector Search
  • Trusted RAG
  • Spatial Knowledge
  • Temporal
  • Behavior Graphs
  • Graph Algorithms
  • Skills
  • CREATE SKILL
  • PROCESS NODE
  • REPROCESS EDGES
  • Sync
  • Programs
  • GPU Acceleration
  • Agent-Native
  • MCP Server
  • Event Sourcing
  • Intent Relay
  • Event Bus
Use Cases
  • Agent Tooling
  • Trusted RAG
  • Knowledge Management
  • Behavior Graphs
  • Autonomous Systems
  • Physical AI
  • Digital Twins
  • Robotics & Perception
  • Sports Analytics
  • Grounded Neural Objects
  • Fraud Detection
Walkthroughs
    Guides
  • Agent Integration
  • Building a World Model
  • Modeling a Social Graph
  • Build a RAG Pipeline
  • Using Skills
  • Behavior Graphs
  • Swarm & Multi-Agent
  • Fleet Coordination
  • Migrate from Cypher / Neo4j
  • From SQL to GQL
  • Filesystem Workspace
  • Data Quality
  • Code Intelligence
  • Scale Patterns
  • v0.7 → v0.8 Lakehouse Fast-Path
  • Tutorials
  • Knowledge Graph
  • Entity Linking
  • Vector Search
  • Graph Algorithms
  • Recipes
  • CRUD
  • Multi-MATCH
  • MERGE (Upsert)
  • Full-Text Search
  • Batch Projection
  • Multi-Source Observation
  • Sports Analytics
Operations
  • CLI
  • REPL Commands
  • Snapshot & Restore
  • Filesystem Projection
  • Plugin Management
  • Agent Governance
  • Server Modes & PG Wire
  • Persistence (ops)
  • Import & Export
  • Deployment
  • Deployment Modes
  • Daemon (UDS)
  • Why not Docker
  • Architecture
  • Engine Architecture
  • Cloud Architecture
  • Sync Protocol (Deep Dive)
  • World Graph Substrate (Preview)
Reference
  • TypeScript API
  • Glossary
  • Naming & Domain Map
  • Data Types
  • Operators
  • Error Codes
  • GQL Reference
  • Known Issues
  • Versioning
  • Licensing
  • Conformance
  • GQL Conformance
  • openCypher TCK
  • Extension Regressions
GQL Reference
    Conformance
  • Conformance Dashboard
  • openCypher TCK Results
  • Extension Regressions
  • Features
  • MATCH Basic
  • CREATE Nodes Edges
  • SET REMOVE Properties
  • DELETE Detach DELETE
  • RETURN WITH WHERE
  • Order BY Limit Skip
  • Order BY Nulls First Last
  • UNWIND
  • Aggregate Functions
  • OPTIONAL MATCH
  • Variable Length Paths
  • Label OR AND NOT Expressions
  • Label Wildcard
  • Quantified Path Sugar
  • Path Modes Walk Trail Simple Acyclic
  • Shortest Path Variants
  • IS Labeled Predicate
  • Element ID Function
  • IS Type Predicate
  • Binary Literals
  • Line Comments Solidus
  • Line Comments Minus
  • GQLSTATUS Result Codes
  • GQL Error Code Mapping
  • Transaction Control Syntax
  • SET Session
  • Conditional Execution WHEN THEN ELSE
  • RETURN NEXT Pipeline
  • Primary Key Constraint
  • Unique Constraint
  • Deterministic MERGE Via PK
  • Undirected Edge MATCH
  • Cast Type Conversion
  • GQL Directories
  • Multiple Labels Per Node
  • GQL Flagger
  • NEXT Linear Composition
  • Cardinality Function
  • INT64 BIGINT Type Names
  • FLOAT64 Double Type Names
  • Log10 Log2 Functions
  • Trim Leading Trailing Both
  • FILTER Clause
  • LET Statement
  • Group BY Explicit
  • EXCEPT SET Operations
  • INTERSECT SET Operations
  • ALL Different Predicate
  • Same Predicate
  • Property Exists Function
  • Path Variable Binding
  • USE Graph Clause
  • FOR IN List
  • Typed Temporal Literals
  • Session SET Value Params
  • Typed List Annotations
  • arcflow.cosine() function
  • arcflow.embed() function
  • arcflow.similar() procedure
  • arcflow.graphrag() procedure
  • ArcFlow Extensions
  • LIVE Queries
  • Triggered Write-Back Views
  • Evidence Algebra
  • Relationship Skills
  • AI Function Namespace
  • Graph Embedding Algorithms
  • ASOF JOIN
  • Durable Workflows
  • Incremental Z-Set Engine
  • GPU GraphBLAS
  • Triggers
  • HNSW Vector Index
  • Extensions Moat

GPU Acceleration

ArcFlow provides three distinct execution innovations for high-performance world model queries:

  • ArcFlow Graph Kernel — executes graph algorithms as a single parallel pass across all nodes simultaneously, not as sequential edge traversals
  • ArcFlow Adaptive Dispatch — routes every operation to the fastest available hardware at runtime, cost-model driven, zero configuration
  • ArcFlow GPU Index — a pointer-free spatial index designed for direct GPU traversal without transformation

The developer writes one query. These three layers work together to pick the fastest path.

CALL algo.pageRank()

ArcFlow Graph Kernel#

Most graph databases walk the graph one edge at a time: visit a node, follow an edge, visit the next. That is sequential, cache-hostile, and does not map to GPU hardware.

The ArcFlow Graph Kernel processes algorithms differently. The world model is held as a compact parallel structure — every algorithm executes as a single pass across all nodes simultaneously, not a recursive walk. PageRank, BFS, connected components, community detection, triangle counting — each runs as one parallel operation. This maps directly to GPU thread blocks and enables the speedups below.

The same kernel runs on CPU when no GPU is present. The parallel structure is inherently more efficient than pointer-chasing traversal on any hardware.


ArcFlow Adaptive Dispatch#

ArcFlow Adaptive Dispatch measures available hardware at startup and routes each operation to the fastest available backend based on a live cost model:

  1. Small graphs (< 200 nodes) — CPU path (dispatch overhead exceeds GPU benefit at this scale)
  2. Apple Silicon (macOS / iOS) — Metal GPU. Unified memory means no CPU→GPU copy overhead — CPU and GPU read the same physical memory.
  3. NVIDIA GPU (Linux / Windows) — CUDA GPU. Dynamic driver loading — no compile-time GPU dependency, same binary runs everywhere.
  4. CPU fallback — ArcFlow's parallel CPU implementation when no GPU is present.

The cost model accounts for kernel launch overhead, memory bandwidth per device, and algorithm parallelism characteristics. There are no hardcoded thresholds — routing adapts to the actual hardware measured at runtime.

Zero configuration. The same GQL query runs identically on a laptop, a workstation, or a GPU cluster.


Measuring on your hardware#

Performance depends on host CPU/GPU/memory characteristics and graph shape. Rather than quote per-host numbers that decay, ArcFlow ships a benchmark harness so you can measure on the hardware you'll actually deploy on:

# From the ozinc/arcflow repo:
cargo bench --bench algo                # CPU + GPU comparisons across algorithms
cargo run --bin metal_baseline          # Apple Silicon Metal-specific baseline

GPU speedup is most pronounced for algorithms with high parallelism (community detection, triangle counting). For simpler traversals the CPU path is already fast — GPU dispatch adds value when the graph is large and the algorithm is inherently parallel.


ArcFlow GPU Index#

Spatial queries require a different execution path from graph algorithms — a spatial index, not graph traversal. ArcFlow Adaptive Dispatch routes spatial queries across four lanes based on candidate count and GPU transfer cost:

LaneWhenTypical use
CpuLive≤ 500 candidatesLIVE queries, real-time tracking
CpuBatch> 500 candidates (CPU faster than transfer)Analytics, replay
GpuLocal> 50K candidates, fits single GPUHigh-density spatial
GpuMultiExceeds single GPU memoryStadium-scale entity tracking

The ArcFlow GPU Index is the structure that makes GpuLocal and GpuMulti lanes possible. It is a pointer-free spatial index designed specifically for GPU traversal — transferring directly to GPU memory without transformation. Traditional pointer-based spatial indexes contain virtual memory addresses that are meaningless to GPU threads; the ArcFlow GPU Index eliminates this boundary entirely.

Multi-GPU Partitioning#

Spatial data is partitioned across GPU devices by a stable hash of node_id. Queries spanning partition boundaries fan out to all relevant GPUs and merge results. Devices connected via high-bandwidth GPU interconnects form peer islands — within an island, work-stealing happens without PCIe transfer cost.

-- Same spatial query — Adaptive Dispatch routes to GpuMulti when warranted
CALL algo.nearestNodes($center, 'Entity', 100) YIELD node, distance RETURN node.name

Instanced Geometry#

For scenes with thousands of identical geometry instances (seats, sensors, obstacles), the ArcFlow GPU Index is shared — one allocation serves all instances. Queries transform coordinates into instance-local space rather than rebuilding the index per instance.


Metal GPU (Apple Silicon)#

On macOS and iOS, ArcFlow Adaptive Dispatch routes to Metal compute shaders. Apple Silicon's unified memory means the ArcFlow Graph Kernel and GPU Index operate in the same physical memory as the CPU — zero copy overhead.

If Metal is unavailable or the graph is below the dispatch threshold, routing falls back to CPU transparently.

Per-family kernel selection#

ArcFlow selects the optimal Metal Shading Language primitive for your Apple GPU family automatically. The integrated loop branches inside the same in-process call — no separate code path, no configuration, no cross-architecture abstraction tax:

Apple GPU familyPrimitive routed
Apple7 (A14 / M1) and newerPipeline-state caching for sub-frame cold start; transient buffer-heap allocations
Apple8 (A15 / M2 / M3) and newersimd_sum / simd_min / simd_max cross-lane reductions for graph aggregates
Apple9 (A17 / M3) and newerNative atomic_float for scatter-accumulate kernels
M3 family and newersimdgroup_matrix tile operations for dense linear-algebra inner loops

arcflow status --json reports the detected GPU family and the selected primitive set. Same binary runs across every Apple device generation; the dispatch decision is per-host.

CPU-side integration#

Vector and dense-numeric paths route through Apple's Accelerate framework (AMX / SME on Apple Silicon, NEON elsewhere) for batched dot products and reductions — including the brute-force fallback path in vector search. CPU work runs on QOS_CLASS_USER_INITIATED for foreground-quality scheduling alongside the GPU loop.


CUDA GPU (Linux / Windows)#

On NVIDIA hardware, Adaptive Dispatch routes to CUDA via dynamic driver loading. No compile-time GPU dependency — the driver is discovered at runtime. If CUDA is unavailable, the same binary falls back to CPU. Covering graph algorithms, vector search, and spatial operations at different workload scales.


GPU Introspection#

CALL db.gpuStatus()#

Returns one row per CUDA device. Use this to check availability and load before submitting large GPU workloads.

CALL db.gpuStatus() YIELD device_id, inflight, sm_count, vram_mib, status
ColumnTypeDescription
device_idintCUDA device index
inflightintCurrently executing GPU kernels
sm_countintStreaming multiprocessor count
vram_mibintDevice VRAM in MiB
statusstring"available" (inflight < 8) or "saturated"

Returns {device_id: "N/A", status: "no CUDA devices"} when no CUDA hardware is present.

On Apple Silicon (macOS / iOS), db.gpuStatus() currently enumerates CUDA devices only, so it returns the "no CUDA devices" shape even when the Metal GPU is fully active. To confirm Metal presence on Mac, inspect db.capabilities() instead:

CALL db.capabilities()
  YIELD capability, value
  WHERE capability IN ['gpu_spmv_semirings', 'gpu_spgemm', 'gpu_deterministic_f64']
  RETURN capability, value

Non-zero gpu_spmv_semirings and a populated gpu_spgemm value indicate Metal kernels are linked and dispatchable. The gpu_backend field reports the backend chosen for the current workload (Adaptive Dispatch keeps small graphs on CPU), not Metal availability.

CALL db.capabilities()#

Returns the engine capability surface, including GPU presence, GPU family flags, and the spgemm/dispatch wiring status. Use this to check whether GPU acceleration is available before submitting large workloads.

CALL db.capabilities()
  YIELD gpu_status, gpu_spgemm, gpu_family
ColumnDescription
gpu_status"available", "saturated", or "no CUDA devices"
gpu_spgemmWhether GPU sparse-matrix dispatch is wired for this build
gpu_familyApple GPU family identifier when running on Metal
validatedWhether the kernel has been validated on this hardware
cuda_min_ccMinimum CUDA compute capability required (e.g. "9.0") or "none"

CALL arcflow.spatial.dispatch_stats()#

Observability for ArcFlow GPU Index routing decisions.

CALL arcflow.spatial.dispatch_stats()
  YIELD lane_chosen, estimated_candidates, actual_candidates,
        prefilter_us, rtree_us, gpu_transfer_us, kernel_us, total_us

gpu_transfer_us and kernel_us are non-zero only when a GPU lane was chosen.


See Also#

  • Graph Algorithms — full algorithm catalog with signatures and output schemas
  • Algorithms Reference — GQL syntax for all 27 procedures
  • Architecture — how Graph Kernel, Adaptive Dispatch, and GPU Index share memory
  • Spatial Queries — GPU-dispatched spatial queries
Try it
Open ↗⌘↵ to run
Loading engine…
← PreviousProgramsNext →Agent-Native