Graph Algorithms
27 graph algorithms built into the world model — no projection lifecycle, no separate analytics service, no plugin installation. Every algorithm operates over the same confidence-scored, temporally-versioned graph that your queries run against. Centrality rankings respect observation class. Community detection propagates across fact confidence. Contradiction detection surfaces conflicting beliefs in the world model. Six algorithms run incrementally via LIVE CALL, maintaining results as standing queries that update with every mutation.
154M PageRank nodes/sec on CPU. 29.6x GPU speedup on community detection. 19.8x on triangle count. GPU acceleration is automatic when hardware is available: same query, same syntax — ArcFlow Adaptive Dispatch routes to the fastest backend at runtime (see GPU Acceleration).
Centrality#
PageRank#
Iterative eigenvector centrality. 20 iterations, damping factor 0.85. 154M nodes/sec on CPU, 2.4x faster on GPU.
CALL algo.pageRank()| Column | Type | Description |
|---|---|---|
| nodeId | integer | Internal node ID |
| name | string | Node label or name property |
| score | float | PageRank score (sums to 1.0) |
-- PageRank on a social network
CREATE (a:Person {name: 'Alice'})-[:FOLLOWS]->(b:Person {name: 'Bob'})
CREATE (b)-[:FOLLOWS]->(c:Person {name: 'Carol'})
CREATE (c)-[:FOLLOWS]->(a)
CALL algo.pageRank()| nodeId | name | score |
|--------|-------|----------|
| 1 | Alice | 0.333333 |
| 2 | Bob | 0.333333 |
| 3 | Carol | 0.333333 |
Confidence-Weighted PageRank#
PageRank weighted by edge confidence values. Higher-confidence edges contribute more to rank propagation.
CALL algo.confidencePageRank()Betweenness Centrality#
Measures how often a node lies on shortest paths between other nodes. Identifies bridges and brokers in the graph.
CALL algo.betweenness()| Column | Type | Description |
|---|---|---|
| nodeId | integer | Internal node ID |
| name | string | Node label or name property |
| betweenness | float | Betweenness centrality score |
CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})
CREATE (b)-[:KNOWS]->(c:Person {name: 'Carol'})
CREATE (b)-[:KNOWS]->(d:Person {name: 'Dave'})
CALL algo.betweenness()| nodeId | name | betweenness |
|--------|------|-------------|
| 2 | Bob | 4.000000 |
| 1 | Alice| 0.000000 |
| 3 | Carol| 0.000000 |
| 4 | Dave | 0.000000 |
Closeness Centrality#
Inverse of the average shortest path distance from a node to all reachable nodes. Higher values indicate more central position.
CALL algo.closeness()| Column | Type | Description |
|---|---|---|
| nodeId | integer | Internal node ID |
| name | string | Node label or name property |
| closeness | float | Closeness centrality score |
Degree Centrality#
Count of relationships per node, optionally filtered by direction.
CALL algo.degreeCentrality()Community Detection#
Louvain#
Hierarchical community detection via modularity optimization. 29.6x GPU speedup on large graphs.
CALL algo.louvain()| Column | Type | Description |
|---|---|---|
| nodeId | integer | Internal node ID |
| name | string | Node label or name property |
| community | integer | Community assignment ID |
-- Detect communities in a social network
CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})
CREATE (a)-[:KNOWS]->(c:Person {name: 'Carol'})
CREATE (d:Person {name: 'Dave'})-[:KNOWS]->(e:Person {name: 'Eve'})
CALL algo.louvain()| nodeId | name | community |
|--------|-------|-----------|
| 1 | Alice | 0 |
| 2 | Bob | 0 |
| 3 | Carol | 0 |
| 4 | Dave | 1 |
| 5 | Eve | 1 |
Leiden#
Refined community detection that guarantees well-connected communities. Improvement over Louvain that avoids poorly connected intermediate communities.
CALL algo.leiden()Returns the same schema as Louvain (nodeId, name, community).
Connected Components (WCC)#
Weakly connected components. Identifies disconnected subgraphs.
CALL algo.connectedComponents()| Column | Type | Description |
|---|---|---|
| nodeId | integer | Internal node ID |
| name | string | Node label or name property |
| component | integer | Component assignment ID |
Community Detection (Label Propagation)#
Fast community detection using label propagation. Near-linear time complexity.
CALL algo.communityDetection()Returns nodeId, name, and community columns. 29.6x GPU speedup on large graphs.
K-Core Decomposition#
Computes the k-core number for each node. A k-core is a maximal subgraph where every node has degree >= k.
CALL algo.kCore()| Column | Type | Description |
|---|---|---|
| nodeId | integer | Internal node ID |
| name | string | Node label or name property |
| core | integer | Core number |
Path Finding#
BFS Shortest Path#
Unweighted shortest path via breadth-first search. 6.3M edges/sec on CPU, 3.5x faster on GPU.
MATCH p = shortestPath((a:Person {name: 'Alice'})-[*..10]->(b:Person {name: 'Dave'}))
RETURN pVariable-length traversal with depth bounds:
MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..5]->(b)
RETURN b.nameDijkstra Shortest Path#
Weighted shortest path with optimized memory reuse. Pass the property key to use as edge weight.
CALL algo.dijkstra(startNodeId, endNodeId, 'weight')| Column | Type | Description |
|---|---|---|
| path | string | Ordered node IDs along the shortest path |
| distance | float | Total weighted path cost |
A* Pathfinding#
Heuristic-guided shortest path. Faster than Dijkstra when a distance estimate is available — the heuristic guides search toward the goal. Pass a node property to use as the heuristic estimate.
CALL algo.astar(startNodeId, endNodeId, 'travel_time', 'estimated_distance')| Parameter | Description |
|---|---|
| start | Source node ID |
| end | Target node ID |
| weight_key | Edge property to use as cost |
| heuristic_key | Node property to use as heuristic estimate |
Confidence-Weighted Shortest Path#
Shortest path weighted by edge confidence. Finds the most trusted route between two nodes.
CALL algo.confidencePath(1, 2)| Column | Type | Description |
|---|---|---|
| path | string | Ordered node IDs in the path |
| confidence | float | Aggregate path confidence |
All-Pairs Shortest Path#
Computes shortest path distances between all node pairs. GPU-accelerated for large graphs.
CALL algo.allPairsShortestPath()| Column | Type | Description |
|---|---|---|
| source | integer | Source node ID |
| target | integer | Target node ID |
| distance | integer | Shortest path distance |
Similarity#
Jaccard Similarity#
Available through node similarity computation. Measures overlap of neighbor sets between node pairs.
Node Similarity#
Pairwise similarity between all nodes based on shared neighbors.
CALL algo.similarNodes()Clustering Coefficient#
Local clustering coefficient for each node. Measures how close a node's neighbors are to forming a clique.
CALL algo.clusteringCoefficient()| Column | Type | Description |
|---|---|---|
| nodeId | integer | Internal node ID |
| name | string | Node label or name property |
| coefficient | float | Local clustering coefficient (0.0 to 1.0) |
GPU-accelerated: dispatches to CUDA/Metal when graph exceeds 50 nodes.
Graph Metrics#
Triangle Count#
Counts the number of triangles in the graph. 19.8x GPU speedup.
CALL algo.triangleCount()| Column | Type | Description |
|---|---|---|
| triangles | integer | Total triangle count |
Density#
Graph density: ratio of actual edges to possible edges.
CALL algo.density()| Column | Type | Description |
|---|---|---|
| density | float | Graph density (0.0 to 1.0) |
Diameter#
Longest shortest path in the graph.
CALL algo.diameter()| Column | Type | Description |
|---|---|---|
| diameter | integer | Graph diameter |
| source_id | integer | Source node of the longest path |
| target_id | integer | Target node of the longest path |
Biconnected Components#
Biconnected components — maximal subgraphs with no articulation points. O(V+E) time complexity.
CALL algo.biconnectedComponents()| Column | Type | Description |
|---|---|---|
| nodeId | integer | Internal node ID |
| component_id | integer | Biconnected component assignment |
Embedding Algorithms#
Node2Vec#
Generate node embeddings via structural random walk. Captures structural position in the graph.
CALL algo.node2vec(128, 80, 10)| Parameter | Description |
|---|---|
| dimensions | Embedding size (e.g., 128) |
| walk_length | Steps per random walk (e.g., 80) |
| num_walks | Walks per node (e.g., 10) |
Sets node.embedding on every node. Use with vector search for structural similarity.
Struct2Vec#
Structural equivalence embeddings — nodes with similar local structure get similar vectors, regardless of where they are in the graph.
CALL algo.struc2vec(64)GraphSAGE#
Inductive node embedding via neighborhood aggregation. Handles unseen nodes (no retraining needed).
CALL algo.graphSAGE(128)Stale Embedding Detection#
Find nodes whose embeddings are out of date (property changed after last embedding).
CALL algo.staleEmbeddings()Returns (nodeId, name) for nodes needing re-embedding.
Embedding Classification#
Classify nodes by similarity to labeled examples using a vector index.
CALL algo.classify($embedding, 'index_name', {k: 10})Returns (nodeId, label, confidence).
Knowledge Graph Algorithms#
Entity Resolution#
Find nodes that refer to the same real-world entity despite different IDs or properties.
CALL algo.entityResolution()Returns (nodeId1, nodeId2, similarity) pairs above the resolution threshold.
Fact Contradiction Detection#
Find pairs of facts in the graph that contradict each other based on predicates and confidence.
CALL algo.factContradiction()Returns (nodeId1, nodeId2, contradiction_type, confidence).
Relationship Strength#
Score how strong a relationship is based on frequency, recency, and mutual connections.
CALL algo.relationshipStrength()Returns (fromId, toId, strength) for all relationships.
Compounding Score#
Composite confidence score that propagates through chains of facts.
CALL algo.compoundingScore()Entity Freshness#
How recently each entity was observed or corroborated. Low freshness = stale facts.
CALL algo.entityFreshness()Returns (nodeId, name, freshness, last_observed_at).
Semantic Deduplication#
Find and score near-duplicate nodes using embedding similarity.
CALL algo.semanticDedup()GraphRAG Pipeline#
GraphRAG#
Full retrieval-augmented generation pipeline: query decomposition, graph traversal, evidence collection with confidence scoring.
CALL algo.graphRAG('What connections exist between Alice and the project?')GraphRAG Context#
Retrieves structured context from the graph for a natural language query, with token budgeting for LLM context windows.
CALL algo.graphRAGContext('Tell me about Alice')Trusted GraphRAG#
Confidence-filtered GraphRAG — only returns evidence above the trust threshold. Facts outrank inferences outrank predictions.
CALL algo.graphRAGTrusted('high-confidence relationships for Alice')Incremental Algorithms (LIVE CALL)#
Six algorithms can run as standing queries via LIVE CALL. Instead of recomputing from scratch on every mutation, they maintain state incrementally. Cost per update is O(affected neighborhood), not O(V+E).
-- Register incremental PageRank -- updates automatically on graph mutations
LIVE CALL algo.pageRank({ damping: 0.85 })
-- Register incremental community detection
LIVE CALL algo.connectedComponents()
LIVE CALL algo.louvain()
-- Register incremental pathfinding and metrics
LIVE CALL algo.bfs({ source: 42 })
LIVE CALL algo.triangleCount()
LIVE CALL algo.clusteringCoefficient()Inspecting live algorithms#
-- List all running incremental algorithms
CALL db.liveAlgorithms()
-- Check a specific algorithm's memory and timing
CALL db.viewStats('pageRank')Built on the Live Delta Stack and Z-set algebra. Every mutation produces a delta (weighted +1/-1) that propagates through the algorithm's state. Results are mathematically identical to a full recompute.
Algorithm Performance Summary#
Benchmarked on a MacBook Air M4 (10-core, 24GB) with 10K-node graphs, single-threaded CPU baseline. GPU speedups measured with CUDA on equivalent hardware. These are laptop numbers — server hardware with more cores and memory will scale further.
| Algorithm | CPU Throughput | GPU Speedup | GPU Dispatch Threshold |
|---|---|---|---|
| PageRank | 154M nodes/sec | 2.4x | 500 nodes |
| BFS Shortest Path | 6.3M edges/sec | 3.5x | 100 nodes |
| Triangle Count | 943K nodes/sec | 19.8x | 100 nodes |
| Community Detection | 185K nodes/sec | 29.6x | 200 nodes |
| Louvain | — | 29.6x | 200 nodes |
| Vector Search | 25K queries/sec @ 128d | 4.2x | 100 vectors |
| All-Pairs Shortest Path | — | GPU-accelerated | 500 nodes |
| Clustering Coefficient | — | GPU-accelerated | 100 nodes |
| Node Similarity | — | GPU-accelerated | 100 nodes |
| Leiden | — | CUDA 9.0+ (H100+) only | 10M nodes |
GPU dispatch is handled by ArcFlow Adaptive Dispatch — which measures graph size at query time and routes to the fastest available backend automatically. No configuration required.
When multiple CUDA devices are present, Adaptive Dispatch selects the least-loaded device and gates dispatch on a per-algorithm cost model. Use CALL db.gpuStatus() to inspect per-device load and CALL dbms.gpuThresholds() to see the per-algorithm dispatch registry.
The parallel execution itself runs through the ArcFlow Graph Kernel — a single-pass parallel structure that maps directly to GPU thread blocks rather than walking the graph one edge at a time.
Six algorithms also support LIVE CALL for incremental maintenance -- see the Incremental Algorithms section above.
See Also#
- Vector Search — vector similarity search, hybrid search, and embedding management
- Live Queries —
LIVE CALLfor incremental algorithm maintenance as standing queries - GPU Acceleration — automatic CUDA/Metal dispatch, multi-GPU routing
- Spatial Queries —
algo.nearestNodes(), frustum queries, ArcFlow Spatial Index - Trusted RAG —
algo.graphRAG(),algo.graphRAGTrusted()in pipeline context