Vector Search

Embeddings are first-class properties of the world model — semantic dimensions queryable alongside graph relationships, confidence scores, and temporal state.

No dedicated vector database. No separate similarity service. An embedding lives on a node the same way a position, a confidence score, or an observation class does — queryable with the same GQL that traverses the graph. Combine semantic similarity with graph structure and temporal snapshots in one query, one process. GPU acceleration available when supported hardware is present.

-- Vector search + graph traversal in one query, one process
CALL algo.vectorSearch('movie_embeddings', [0.12, -0.34, 0.56, ...], 5)

Create a Vector Index#

CREATE VECTOR INDEX movie_embeddings
  FOR (m:Movie)
  ON (m.embedding)
  OPTIONS {dimensions: 128, similarity: 'cosine'}

-----	---
------	---

----| | dimensions | 1–4096 | required | Vector dimensionality | | similarity | 'cosine', 'euclidean', 'dotProduct' | 'cosine' | Distance function |

Similarity Functions#

Cosine — angle between vectors, normalized. Best for text embeddings where magnitude is irrelevant.

Euclidean — L2 distance. Best for spatial data and embeddings where absolute position matters.

Dot Product — raw inner product. Best for pre-normalized vectors where higher values mean more similar.

-- Euclidean distance for spatial embeddings
CREATE VECTOR INDEX spatial_vec
  FOR (p:Point)
  ON (p.coords)
  OPTIONS {dimensions: 3, similarity: 'euclidean'}
 
-- Dot product for pre-normalized embeddings
CREATE VECTOR INDEX doc_vec
  FOR (d:Document)
  ON (d.embedding)
  OPTIONS {dimensions: 768, similarity: 'dotProduct'}

Search#

K-Nearest Neighbor Search#

CALL algo.vectorSearch('movie_embeddings', [0.1, 0.2, 0.3, ...], 10)

--------	---

----| | index name | string | Name of the vector index | | query vector | float[] | Query vector (must match index dimensions) | | k | integer | Number of nearest neighbors to return |

Returns rows with nodeId, similarity score, and all node properties.

-- Find 5 movies most similar to a query embedding
CALL algo.vectorSearch('movie_embeddings', [0.12, -0.34, 0.56, ...], 5)

| nodeId | name              | score    |
|---
-----|---
------
------
----|---
-------|
| 42     | The Matrix        | 0.952341 |
| 17     | Blade Runner      | 0.891205 |
| 23     | Ghost in the Shell| 0.847193 |
| 8      | Tron              | 0.812045 |
| 31     | Ex Machina        | 0.798412 |

Similar Nodes#

Find nodes structurally similar based on their vector embeddings:

CALL algo.similarNodes()

Hybrid Search#

Combine graph traversal with vector similarity in a single query. Graph structure constrains the candidate set, then vector similarity ranks results.

CALL algo.hybridSearch()

Hybrid search uses graph adjacency to narrow candidates before running vector comparison — faster than brute-force vector scan on large graphs.

-- Example: find documents similar to a query, but only within a specific project
CREATE (p:Project {name: 'ArcFlow'})
CREATE (d1:Document {name: 'Architecture', embedding: [0.1, 0.2, 0.3]})
CREATE (d2:Document {name: 'API Guide', embedding: [0.15, 0.22, 0.28]})
CREATE (p)-[:CONTAINS]->(d1)
CREATE (p)-[:CONTAINS]->(d2)
 
-- Traverse graph to project, then vector search within results
MATCH (p:Project {name: 'ArcFlow'})-[:CONTAINS]->(d:Document)
RETURN d.name

Vector Indexes Management#

List all vector indexes:

CALL db.indexes()

Measuring on your hardware#

Vector throughput depends on host CPU/GPU and embedding dimension. Measure on the hardware you'll deploy on:

# From the ozinc/arcflow repo:
cargo bench --bench vector

GPU acceleration is automatic based on index size — no configuration required. On CUDA hardware with larger vector collections, ArcFlow Adaptive Dispatch routes automatically to the GPU-accelerated nearest-neighbour path. Same query, zero configuration.

Use Cases#

RAG Pipeline#

ArcFlow's vector search integrates directly with the GraphRAG pipeline:

-- Index document embeddings
CREATE VECTOR INDEX doc_embeddings
  FOR (d:Document)
  ON (d.embedding)
  OPTIONS {dimensions: 768, similarity: 'cosine'}
 
-- Ingest documents with embeddings
CREATE (d:Document {
  name: 'Architecture Overview',
  content: 'ArcFlow is a graph database...',
  embedding: [0.12, -0.34, ...]
})
 
-- Run GraphRAG with vector-backed retrieval
CALL algo.graphRAG('How does the storage engine work?')

Semantic Search#

-- Create embeddings from your ML pipeline, store directly in the graph
CREATE (n:Concept {
  name: 'machine learning',
  embedding: [0.45, 0.12, -0.33, ...]
})
 
-- Query with a new embedding
CALL algo.vectorSearch('concept_embeddings', [0.44, 0.13, -0.31, ...], 10)

Recommendation Engine#

-- Users and items with embeddings in the same vector space
CREATE VECTOR INDEX user_item_vec
  FOR (n:Entity)
  ON (n.embedding)
  OPTIONS {dimensions: 128, similarity: 'dotProduct'}
 
-- Find items closest to a user's embedding
CALL algo.vectorSearch('user_item_vec', [0.2, 0.8, -0.1, ...], 20)

Vector Search

Create a Vector Index#

Similarity Functions#

Search#

K-Nearest Neighbor Search#

Similar Nodes#

Hybrid Search#

Vector Indexes Management#

Measuring on your hardware#

Use Cases#

RAG Pipeline#

Semantic Search#

Recommendation Engine#

See Also#

Vector Search

Create a Vector Index#

Similarity Functions#

Search#

K-Nearest Neighbor Search#

Similar Nodes#

Hybrid Search#

Vector Indexes Management#

Measuring on your hardware#

Use Cases#

RAG Pipeline#

Semantic Search#

Recommendation Engine#

See Also#