Spatial Queries
Space is a first-class dimension of the world model — not a numeric column and not a join between two systems.
ArcFlow's R*-tree index makes spatial position a native query axis alongside graph relationships and temporal state. K-nearest, radius, bounding-box, and frustum queries compose with graph traversal in a single GQL statement — "find the 5 nearest warehouses that have inventory for SKU X" is one query, not two. Position updates trigger live geofence queries through incremental propagation. No approximation, no separate spatial service, no infrastructure gap between where things are and what they connect to.
-- Supply chain: spatial filter → graph traversal → property filter — one query
CALL algo.nearestNodes(order.location, 'Warehouse', 5)
YIELD node AS wh, distance
MATCH (wh)-[:SUPPLIES]->(item:Item {sku: $sku})
WHERE item.stock > 0
RETURN wh.name, distance, item.stock
ORDER BY distanceWhy R*-tree#
Most spatial indexes in the graph database world are built for the wrong problem:
| Index type | Designed for | Exact? |
|---|---|---|
| HNSW (vector similarity) | Embedding nearest-neighbor — approximate by design | No |
| Random-walk sampling | Graph connectivity estimation — misses answers | No |
| GIST R-tree | 2D geometry containment | Yes |
| R*-tree (ArcFlow) | 2D/3D spatial containment, KNN, proximity | Yes |
HNSW finds "approximately similar" embeddings. For exact containment, bounding-box overlap, and dynamic 2D/3D boundaries it produces false positives and has no structural advantage over a spatial tree. Random-walk sampling is explicitly approximate — controlled by a sampleRate parameter that trades correctness for speed.
The R*-tree uses forced reinsertion on node overflow — instead of immediately splitting a full node, it removes ~30% of entries and reinserts them. This stochastic restructuring minimises bounding-box overlap and dead space, giving 2–5× faster queries than a basic R-tree on clustered data.
Spatial Primitives#
Point Types#
-- 2D cartesian (pitch, factory floor, map tile)
CREATE (p:Player {name: 'Alice', position: point({x: 34.2, y: 67.1})})
-- 3D cartesian (venue world space, robot arm, drone)
CREATE (r:Robot {name: 'AGV-7', position: point({x: 12.5, y: 8.3, z: 0.0})})
-- Geographic (WGS84)
CREATE (s:Sensor {name: 'Cam-01', location: point({latitude: 51.5074, longitude: -0.1278})})Flat coordinate properties (x, y, z) are also indexed automatically. Point and Point3d values are stored in a SIMD-optimized columnar layout — axes are contiguous in memory, enabling AVX2 to evaluate 4 distance comparisons per instruction. This eliminates the ~6.9× cache-miss penalty of pointer-chasing through heap-allocated node structs.
K-Nearest Neighbor#
CALL algo.nearestNodes(point({x: 10.0, y: 10.0}), 'Sensor', 5)
YIELD node, distance
RETURN node.name, distanceRecent engine benchmarks show spatial k-nearest throughput around 471K ops/s on the corrected benchmark surface. The index is dynamic — inserts, updates, and deletes are O(log N) with no rebuild.
Radius Search#
-- All players within 20 meters
MATCH (p:Player)
WHERE distance(p.position, point({x: 10.0, y: 10.0})) < 20.0
RETURN p.name, p.positionThe query compiler detects distance() predicates and pushes them into the R*-tree (SpatialIndexScan plan node) rather than scanning all nodes. A coarse grid pre-filter reduces R*-tree candidate sets by ~95% on typical datasets.
Bounding Box#
-- Entities within a rectangular zone
MATCH (e:Entity)
WHERE e.position.x >= 0.0 AND e.position.x <= 50.0
AND e.position.y >= 0.0 AND e.position.y <= 30.0
RETURN e.nameFrustum / Visibility Queries#
Objects within a camera or sensor frustum use 6-plane containment:
-- Entities visible from a camera frustum
CALL algo.objectsInFrustum($frustum)
YIELD node, distance
RETURN node.name, distance
ORDER BY distance
-- Nearest object visible to any of N cameras
CALL algo.nearestVisible($position, 'Camera', 10)
YIELD node, distance
RETURN node.name, distance
-- Line-of-sight raycast
CALL spatial.raycast(
point({x: 0, y: 0, z: 2}), -- origin
point({x: 1, y: 0, z: 0}), -- direction
100.0 -- max distance
) YIELD hit, distance
RETURN hit.name, distancePerformance: objectsInFrustum over 50 entities, 10 frusta: < 2ms.
Spatial + Graph in One Query#
The R*-tree and the graph share unified in-process memory. No serialization. No network hop between a spatial service and a graph database. Fused spatial+graph query plans execute as a DAG — the spatial prefilter and graph traversal run concurrently, and the result merge fires as soon as both branches complete. No intermediate materialization.
Supply Chain#
-- Nearest warehouses with inventory → shortest delivery route
CALL algo.nearestNodes(order.location, 'Warehouse', 5)
YIELD node AS wh, distance AS crow_flies
MATCH (wh)-[:SUPPLIES]->(item:Item {sku: $sku})
WHERE item.stock > 0
MATCH path = shortestPath((wh)-[:ROAD*]->(dest:Depot {id: $depot}))
RETURN wh.name, item.stock, length(path) AS road_hops
ORDER BY road_hopsIoT / Fleet#
-- Vehicles within 500m, with valid license, maintained within 30 days
CALL algo.nearestNodes($zone_center, 'Vehicle', 50)
YIELD node AS v, distance
WHERE distance < 500
MATCH (v)-[:DRIVEN_BY]->(d:Driver)-[:HOLDS]->(l:License)
WHERE l.expires > timestamp()
AND v.last_service_date > timestamp() - 2592000000
RETURN v.plate, d.name, distanceLive Production (50fps)#
-- Offside detection: spatial filter → graph → rule check
CALL algo.nearestNodes(ball.position, 'Player', 22)
YIELD node AS player, distance
MATCH (player)-[:PLAYS_FOR]->(team:Team {side: 'attacking'})
WHERE player.position.x > last_defender.position.x
RETURN player.name, player.positionEpidemiology#
-- People within 10m of patient zero → contact graph → 48h temporal filter
CALL algo.nearestNodes(patient_zero.location, 'Person', 1000)
YIELD node AS contact, distance
WHERE distance < 10.0
MATCH path = (patient_zero)-[:CONTACT*1..3]->(contact)
WHERE ALL(r IN relationships(path) WHERE r.timestamp > timestamp() - 172800000)
RETURN contact.name, length(path) AS degrees, distanceLive Geofencing#
Spatial standing queries fire through incremental propagation from the nodes whose spatial properties changed — the engine does not recompute all standing queries on every mutation.
-- Named geofence: tracks all players entering zone alpha
CREATE LIVE VIEW zone_alpha_occupants AS
MATCH (p:Player)
WHERE p.position.x >= 30 AND p.position.x <= 70
AND p.position.y >= 30 AND p.position.y <= 70
RETURN p.name, p.position
-- Read current occupants
MATCH (row) FROM VIEW zone_alpha_occupants RETURN row.name
-- Anonymous live geofence: radius-based
LIVE MATCH (p:Player)
WHERE distance(p.position, point({x: 50, y: 50})) < 25.0
RETURN p.name, p.positionIn TypeScript, subscribe to geofence enter/exit events via the live query API:
const lq = db.subscribe(
"MATCH (p:Player) WHERE distance(p.position, $center) < $radius RETURN p.name, p.position",
(event) => {
for (const row of event.added) console.log('entered:', row.name)
for (const row of event.removed) console.log('left:', row.name)
},
{ pollIntervalMs: 20 }
)
// Stop when done
lq.cancel()event.added contains entities that entered the zone since the last tick. event.removed contains entities that left. GeofenceEnter and GeofenceExit transitions are edge-triggered — a query that stays inside the zone does not fire on every tick, only on the boundary crossing.
Three trigger priorities control batching latency:
| Priority | Latency | Use case |
|---|---|---|
Immediate | 1-tick | Offside detection, goal-line technology |
Fast | 2–3 ticks | Zone entry/exit alerts |
Background | 10+ ticks | Heatmaps, session analytics |
Coordinate Frame Validation#
ArcFlow validates coordinate system metadata at ingest. Mixing coordinate frames — WGS84 positions in a Z-up cartesian graph — raises a hard error rather than silently producing wrong results.
CALL db.spatialMetadata()
YIELD crs, meters_per_unit, up_axis, handedness, calibration_version| Field | Default | Description |
|---|---|---|
crs | "cartesian" | Coordinate reference system ("cartesian", "WGS84") |
meters_per_unit | 1.0 | Scale — 1.0 for meters, 0.01 for centimeters |
up_axis | "Z" | Z-up (venue/robotics default) or Y-up (DCC/game engines) |
handedness | "right" | "right" or "left" |
calibration_version | 0 | Increment on venue re-calibration to invalidate stale queries |
Errors raised on mismatch:
CrsMismatch— inserting WGS84 point into a Cartesian graphUpAxisMismatch— Y-up data into a Z-up storeUnitScaleMismatch— centimeter data into a meter-unit storeStaleCalibration— data carries an older calibration version than the store
USD World Model#
ArcFlow ingests USD stage geometry in bulk and makes it immediately queryable via GQL. 500K prims load in under 1 second using Sort-Tile-Recursive spatial packing.
import { open } from 'arcflow'
const db = open('./scene')
// Ingest USD prim hierarchy — creates graph nodes with spatial index
db.ingestUsdPrims(primRecords)
// Query the ingested geometry via GQL
const nearby = db.query(
"MATCH (p:Mesh) WHERE distance(p.position, $cam) < 500 RETURN p.name, p.position",
{ cam: JSON.stringify([120.0, 45.0, 30.0]) }
)Coordinate frame (metersPerUnit, upAxis, CRS) is validated against the store's SpatialMetadata before the first insert. A mismatch on any field raises a hard error — no partial ingestion.
Instanced geometry (chairs, seats, sensors in stadium models) shares a single spatial index in memory. 10,000 instances of the same prim geometry consume one index allocation, not 10,000. Queries into instanced geometry transform the query coordinates into prim-local space rather than rebuilding or copying the shared index.
Multi-GPU Dispatch#
Spatial queries are automatically routed to the appropriate execution lane based on a cost model that accounts for query shape, candidate count, and GPU transfer overhead:
| Lane | When | Candidates |
|---|---|---|
CpuLive | Live queries, time-critical | ≤ 500 |
CpuBatch | Analytics, replay, background | > 500 (CPU faster than GPU transfer) |
GpuLocal | High-density spatial | > 50K, fits single GPU |
GpuMulti | Large-scale spatial | Doesn't fit one GPU |
For multi-GPU setups, nodes are partitioned across GPUs by a stable hash of node_id. Queries spanning GPU boundaries fan out to all relevant GPUs and merge results. GPUs connected via NVLink or Infinity Fabric form peer islands that share work without PCIe transfer cost.
The ArcFlow GPU Index is a pointer-free spatial index designed for direct GPU traversal. It transfers to GPU memory without transformation — no fixup, no reconstruction. Pointer-based tree structures contain virtual memory addresses that are meaningless to GPU threads; the ArcFlow GPU Index is built as a parallel representation from the ground up, eliminating this boundary entirely.
Observability#
CALL arcflow.spatial.dispatch_stats()
YIELD lane_chosen, estimated_candidates, actual_candidates,
prefilter_us, rtree_us, gpu_transfer_us, kernel_us, total_us| Field | Description |
|---|---|
lane_chosen | Execution lane: CpuLive, CpuBatch, GpuLocal, GpuMulti |
estimated_candidates | Candidates estimated by coarse grid prefilter |
actual_candidates | Candidates after R*-tree filtering |
prefilter_us | Coarse grid filter time (μs) |
rtree_us | R*-tree traversal time (μs) |
gpu_transfer_us | Host→device transfer time (μs; 0 for CPU lanes) |
kernel_us | GPU kernel execution time (μs; 0 for CPU lanes) |
total_us | End-to-end query latency (μs) |
For live spatial trigger metrics:
CALL arcflow.spatial.trigger_stats()
YIELD query_name, node_id, predicate_type, evaluation_us, firedMulti-Clock Sensor Fusion#
Different sensors report at different rates. ArcFlow's temporal clock system handles this natively:
CALL db.clock('camera_01', 'sensor', 33) -- 30fps
CALL db.clock('lidar_01', 'sensor', 100) -- 10Hz
CALL db.clock('gps_01', 'sensor', 1000) -- 1Hz
MATCH (r:Robot) RETURN r.position, r.orientation, r._clock_domainUse Cases#
Digital Twin (Venue / Building / Factory)#
CREATE (v:Venue {name: 'Stadium'})
CREATE (c:Camera {name: 'Cam-01', position: point({x: 0, y: 50, z: 15})})
CREATE (z:Zone {name: 'Pitch'})
-- Real-time tracking
MATCH (p:Player {jersey: 7})
SET p.position = point({x: 34.2, y: 67.1, z: 0.0})
-- Who's near the ball?
CALL algo.nearestNodes(ball.position, 'Player', 5)
YIELD node, distance
RETURN node.name, distanceWarehouse Robotics#
-- Assign nearest idle robot to a pick location
CALL algo.nearestNodes($pick_location, 'Robot', 1)
YIELD node AS nearest
SET nearest.task = 'pick'
SET nearest.target = $pick_location
-- Collision avoidance
MATCH (a:Robot), (b:Robot)
WHERE a <> b AND distance(a.position, b.position) < 2.0
RETURN a.name, b.name, distance(a.position, b.position)Performance Reference#
| Operation | Entities | Result |
|---|---|---|
| K-nearest (k=10) | benchmark surface | ~471K ops/s |
| Radius search | benchmark surface | ~5.8K ops/s |
| Frustum query (10 frusta) | 50 | < 2ms |
| Line crossing check | 50 tracks | < 0.5ms |
| Zone occupancy (50fps, 500 entities) | 500 | < 50ms per 10s window |
| Live geofence trigger latency | — | incremental, edge-triggered |
| USD bulk ingest | 500K prims | < 1 second |
| Spatial join (5-NN + 2-hop graph) | 11K | ≥ 500/s |
Compared to Other Approaches#
Richer Geometry Tooling#
Some GIS-oriented stacks have richer geometry types such as LineString, Polygon, and MultiPolygon. ArcFlow's advantage is that spatial predicates compose directly with graph traversal in one engine. If you need "find entities near X, then traverse their relationships," ArcFlow eliminates the two-system join.
Approximate spatial in graph databases#
Most graph databases reach for vector similarity indexes (HNSW) or random-walk sampling when asked to do spatial work. Both are approximate by design — they trade correctness for speed in a way that is appropriate for embedding search but wrong for geometric queries. Offside detection, collision avoidance, and delivery routing require exact answers. A false positive in proximity detection is not a search quality issue — it is a safety failure.
ROS2 / Nav2#
ArcFlow complements Nav2: Nav2 handles real-time path planning with hard latency guarantees; ArcFlow provides the persistent spatial knowledge layer — mission planning, post-incident replay, spatial-temporal reasoning. The multi-clock domain system maps directly to ROS2's multi-rate sensor model.
See Also#
- Spatial Reference — full spatial query syntax, R*-tree, frustum, and raycast
- Robotics & Perception — sensor fusion pipeline and ROS2 integration patterns
- Autonomous Systems — multi-agent fleet coordination on a shared spatial world model
- Temporal Queries — combine spatial position with
AS OFfor historical trajectory queries - Digital Twins — live spatial replica of physical environments