ArcFlow
Company
Managed Services
Markets
  • News
  • LOG IN
  • GET STARTED

OZ brings Visual Intelligence to physical venues, a managed edge layer that lets real-world environments see, understand, and act in real time.

Talk to us

ArcFlow

  • World Models
  • Sensors

Managed Services

  • OZ VI Venue 1
  • Case Studies

Markets

  • Sports
  • Broadcasting
  • Robotics

Company

  • About
  • Technology
  • Careers
  • Contact

Ready to see it live?

Talk to the OZ team about deploying at your venues, from a single pilot match to a full regional rollout.

Schedule a deployment review

© 2026 OZ. All rights reserved.

LinkedIn
ArcFlow Docs
Get Started
  • Get Started
  • Quickstart
  • Installation
  • Project Setup
  • Platforms
  • Bindings
  • Licensing
  • Pricing
Capabilities
  • Vector Search
  • Graph Algorithms
  • Sync
  • MCP Server (AI Agents)
  • Live Queries
  • Programs
  • Temporal
  • Spatial
  • Trusted RAG
  • Behavior Graph
  • Agent-Native
  • Event Sourcing
  • GPU Acceleration
  • Intent Relay
Concepts
  • World Model
  • Graph Model
  • Query Language (GQL)
  • Graph Patterns
  • SQL vs GQL
  • Parameters
  • Query Results
  • Persistence & WAL
  • Error Handling
  • Observations & Evidence
  • Confidence & Provenance
  • Proof Artifacts & Gates
  • Skills
GQL / WorldCypher
  • Overview
  • MATCH
  • WHERE
  • RETURN
  • OPTIONAL MATCH
  • CREATE
  • SET
  • MERGE
  • DELETE
  • REMOVE
  • WITH
  • UNION
  • UNWIND
  • CASE
  • Spatial Queries
  • Temporal Queries
  • Algorithms Reference
  • Triggers
Schema
  • Overview
  • Indexes
  • Constraints
  • Data Types
Functions
  • Built-in Functions
  • Aggregations
  • Procedures
  • Shortest Path
  • EXPLAIN
  • PROFILE
Skills
  • Overview
  • CREATE SKILL
  • PROCESS NODE
  • REPROCESS EDGES
Operations
  • CLI
  • REPL Commands
  • Snapshot & Restore
  • Server Modes & PG Wire
  • Persistence
  • Import & Export
  • Docker
  • Architecture
  • Cloud Architecture
  • Sync Protocol (Deep Dive)
Guides
  • Agent Integration
  • World Model
  • Graph Model Fundamentals
  • Trusted RAG
  • Using Skills
  • Behavior Graphs
  • Swarm & Multi-Agent
  • Migration Guide
  • Filesystem Workspace
  • From SQL to GQL
  • ArcFlow for Coding Agents
  • Data Quality & Pipeline Integrity
  • Code Intelligence
Tutorials
  • Knowledge Graph
  • Entity Linking
  • Vector Search
  • Graph Algorithms
Recipes
  • CRUD
  • Multi-MATCH
  • MERGE (Upsert)
  • Full-Text Search
  • Temporal Queries
  • Batch Projection
  • GraphRAG
Use Cases
  • Agent Tooling
  • Knowledge Management
  • RAG Pipeline
  • Fraud Detection
  • Sports Analytics
  • Grounded Neural Objects
  • Behavior Graphs
  • Autonomous Systems
  • Digital Twins
  • Robotics & Perception
Reference
  • TypeScript API
  • GQL Conformance
  • Compatibility Matrix
  • Glossary
  • Data Types
  • Operators
  • Error Codes
  • Known Issues

GPU Acceleration

ArcFlow provides three distinct execution innovations for high-performance world model queries:

  • ArcFlow Graph Kernel — executes graph algorithms as a single parallel pass across all nodes simultaneously, not as sequential edge traversals
  • ArcFlow Adaptive Dispatch — routes every operation to the fastest available hardware at runtime, cost-model driven, zero configuration
  • ArcFlow GPU Index — a pointer-free spatial index designed for direct GPU traversal without transformation

The developer writes one query. These three layers work together to pick the fastest path.

CALL algo.pageRank()

ArcFlow Graph Kernel#

Most graph databases walk the graph one edge at a time: visit a node, follow an edge, visit the next. That is sequential, cache-hostile, and does not map to GPU hardware.

The ArcFlow Graph Kernel processes algorithms differently. The world model is held as a compact parallel structure — every algorithm executes as a single pass across all nodes simultaneously, not a recursive walk. PageRank, BFS, connected components, community detection, triangle counting — each runs as one parallel operation. This maps directly to GPU thread blocks and enables the speedups below.

The same kernel runs on CPU when no GPU is present. The parallel structure is inherently more efficient than pointer-chasing traversal on any hardware.


ArcFlow Adaptive Dispatch#

ArcFlow Adaptive Dispatch measures available hardware at startup and routes each operation to the fastest available backend based on a live cost model:

  1. Small graphs (< 200 nodes) — CPU path (dispatch overhead exceeds GPU benefit at this scale)
  2. Apple Silicon (macOS / iOS) — Metal GPU. Unified memory means no CPU→GPU copy overhead — CPU and GPU read the same physical memory.
  3. NVIDIA GPU (Linux / Windows) — CUDA GPU. Dynamic driver loading — no compile-time GPU dependency, same binary runs everywhere.
  4. CPU fallback — ArcFlow's parallel CPU implementation when no GPU is present.

The cost model accounts for kernel launch overhead, memory bandwidth per device, and algorithm parallelism characteristics. There are no hardcoded thresholds — routing adapts to the actual hardware measured at runtime.

Zero configuration. The same GQL query runs identically on a laptop, a workstation, or a GPU cluster.


Benchmark Results#

Performance measured against ArcFlow's own CPU path on the same hardware:

AlgorithmCPUGPU Speedup
PageRank154M nodes/sec2.4x
BFS Frontier6.3M edges/sec3.5x
Vector Distance25K queries/sec4.2x
Triangle Count943K nodes/sec19.8x
Community Detection185K nodes/sec29.6x

Speedup is most pronounced for algorithms with high parallelism (community detection, triangle counting). For simpler traversals, the CPU path is already fast — GPU dispatch adds value when the graph is large and the algorithm is inherently parallel.


ArcFlow GPU Index#

Spatial queries require a different execution path from graph algorithms — a spatial index, not graph traversal. ArcFlow Adaptive Dispatch routes spatial queries across four lanes based on candidate count and GPU transfer cost:

LaneWhenTypical use
CpuLive≤ 500 candidatesLIVE queries, real-time tracking
CpuBatch> 500 candidates (CPU faster than transfer)Analytics, replay
GpuLocal> 50K candidates, fits single GPUHigh-density spatial
GpuMultiExceeds single GPU memoryStadium-scale entity tracking

The ArcFlow GPU Index is the structure that makes GpuLocal and GpuMulti lanes possible. It is a pointer-free spatial index designed specifically for GPU traversal — transferring directly to GPU memory without transformation. Traditional pointer-based spatial indexes contain virtual memory addresses that are meaningless to GPU threads; the ArcFlow GPU Index eliminates this boundary entirely.

Multi-GPU Partitioning#

Spatial data is partitioned across GPU devices by a stable hash of node_id. Queries spanning partition boundaries fan out to all relevant GPUs and merge results. Devices connected via high-bandwidth GPU interconnects form peer islands — within an island, work-stealing happens without PCIe transfer cost.

-- Same spatial query — Adaptive Dispatch routes to GpuMulti when warranted
CALL algo.nearestNodes($center, 'Entity', 100) YIELD node, distance RETURN node.name

Instanced Geometry#

For scenes with thousands of identical geometry instances (seats, sensors, obstacles), the ArcFlow GPU Index is shared — one allocation serves all instances. Queries transform coordinates into instance-local space rather than rebuilding the index per instance.


Metal GPU (Apple Silicon)#

On macOS and iOS, ArcFlow Adaptive Dispatch routes to Metal compute shaders. Apple Silicon's unified memory means the ArcFlow Graph Kernel and GPU Index operate in the same physical memory as the CPU — zero copy overhead.

PageRank on 10K nodes — CPU: 0.6ms / GPU: 0.25ms (2.4x)

If Metal is unavailable or the graph is below the dispatch threshold, routing falls back to CPU transparently.


CUDA GPU (Linux / Windows)#

On NVIDIA hardware, Adaptive Dispatch routes to CUDA via dynamic driver loading. No compile-time GPU dependency — the driver is discovered at runtime. If CUDA is unavailable, the same binary falls back to CPU. Covering graph algorithms, vector search, and spatial operations at different workload scales.


GPU Introspection#

CALL db.gpuStatus()#

Returns one row per CUDA device. Use this to check availability and load before submitting large GPU workloads.

CALL db.gpuStatus() YIELD device_id, inflight, sm_count, vram_mib, status
ColumnTypeDescription
device_idintCUDA device index
inflightintCurrently executing GPU kernels
sm_countintStreaming multiprocessor count
vram_mibintDevice VRAM in MiB
statusstring"available" (inflight < 8) or "saturated"

Returns {device_id: "N/A", status: "no CUDA devices"} when no CUDA hardware is present.

CALL dbms.gpuThresholds()#

Returns the Adaptive Dispatch registry — minimum requirements for each algorithm before GPU routing is considered.

CALL dbms.gpuThresholds()
  YIELD algorithm, min_input_size, bytes_per_element, validated, cuda_min_cc
ColumnDescription
algorithmAlgorithm name (e.g. "pageRank", "leiden")
min_input_sizeMinimum node count before GPU dispatch is considered
bytes_per_elementPer-node memory estimate for transfer cost calculation
validatedWhether the kernel has been validated on this hardware
cuda_min_ccMinimum CUDA compute capability required (e.g. "9.0") or "none"

CALL arcflow.spatial.dispatch_stats()#

Observability for ArcFlow GPU Index routing decisions.

CALL arcflow.spatial.dispatch_stats()
  YIELD lane_chosen, estimated_candidates, actual_candidates,
        prefilter_us, rtree_us, gpu_transfer_us, kernel_us, total_us

gpu_transfer_us and kernel_us are non-zero only when a GPU lane was chosen.


See Also#

  • Graph Algorithms — full algorithm catalog with signatures and output schemas
  • Algorithms Reference — GQL syntax for all 27 procedures
  • Architecture — how Graph Kernel, Adaptive Dispatch, and GPU Index share memory
  • Spatial Queries — GPU-dispatched spatial queries
Try it
Open ↗⌘↵ to run
Loading engine…
← PreviousEvent SourcingNext →Intent Relay