Vector Search
Embeddings are first-class properties of the world model — semantic dimensions queryable alongside graph relationships, confidence scores, and temporal state.
No dedicated vector database. No separate similarity service. An embedding lives on a node the same way a position, a confidence score, or an observation class does — queryable with the same GQL that traverses the graph. Combine semantic similarity with graph structure and temporal snapshots in one query, one process. 25,000 queries/sec at 128d. GPU acceleration delivers 4.2x additional speedup when available.
-- Vector search + graph traversal in one query, one process
CALL algo.vectorSearch('movie_embeddings', [0.12, -0.34, 0.56, ...], 5)Create a Vector Index#
CREATE VECTOR INDEX movie_embeddings
FOR (m:Movie)
ON (m.embedding)
OPTIONS {dimensions: 128, similarity: 'cosine'}| Option | Values | Default | Description |
|---|---|---|---|
| dimensions | 1–4096 | required | Vector dimensionality |
| similarity | 'cosine', 'euclidean', 'dotProduct' | 'cosine' | Distance function |
Similarity Functions#
Cosine — angle between vectors, normalized. Best for text embeddings where magnitude is irrelevant.
Euclidean — L2 distance. Best for spatial data and embeddings where absolute position matters.
Dot Product — raw inner product. Best for pre-normalized vectors where higher values mean more similar.
-- Euclidean distance for spatial embeddings
CREATE VECTOR INDEX spatial_vec
FOR (p:Point)
ON (p.coords)
OPTIONS {dimensions: 3, similarity: 'euclidean'}
-- Dot product for pre-normalized embeddings
CREATE VECTOR INDEX doc_vec
FOR (d:Document)
ON (d.embedding)
OPTIONS {dimensions: 768, similarity: 'dotProduct'}Search#
K-Nearest Neighbor Search#
CALL algo.vectorSearch('movie_embeddings', [0.1, 0.2, 0.3, ...], 10)| Parameter | Type | Description |
|---|---|---|
| index name | string | Name of the vector index |
| query vector | float[] | Query vector (must match index dimensions) |
| k | integer | Number of nearest neighbors to return |
Returns rows with nodeId, similarity score, and all node properties.
-- Find 5 movies most similar to a query embedding
CALL algo.vectorSearch('movie_embeddings', [0.12, -0.34, 0.56, ...], 5)| nodeId | name | score |
|--------|-------------------|----------|
| 42 | The Matrix | 0.952341 |
| 17 | Blade Runner | 0.891205 |
| 23 | Ghost in the Shell| 0.847193 |
| 8 | Tron | 0.812045 |
| 31 | Ex Machina | 0.798412 |
Similar Nodes#
Find nodes structurally similar based on their vector embeddings:
CALL algo.similarNodes()Hybrid Search#
Combine graph traversal with vector similarity in a single query. Graph structure constrains the candidate set, then vector similarity ranks results.
CALL algo.hybridSearch()Hybrid search uses graph adjacency to narrow candidates before running vector comparison — faster than brute-force vector scan on large graphs.
-- Example: find documents similar to a query, but only within a specific project
CREATE (p:Project {name: 'ArcFlow'})
CREATE (d1:Document {name: 'Architecture', embedding: [0.1, 0.2, 0.3]})
CREATE (d2:Document {name: 'API Guide', embedding: [0.15, 0.22, 0.28]})
CREATE (p)-[:CONTAINS]->(d1)
CREATE (p)-[:CONTAINS]->(d2)
-- Traverse graph to project, then vector search within results
MATCH (p:Project {name: 'ArcFlow'})-[:CONTAINS]->(d:Document)
RETURN d.nameVector Indexes Management#
List all vector indexes:
CALL db.indexes()Performance#
| Dimension | Throughput | Latency (p50) |
|---|---|---|
| 128d | 25,000 queries/sec | 0.04ms |
| 768d | 4,600 queries/sec | 0.22ms |
GPU acceleration delivers 4.2x speedup on vector search when available. Dispatch is automatic based on index size — no configuration required.
On CUDA hardware with larger vector collections, ArcFlow Adaptive Dispatch routes automatically to the GPU-accelerated nearest neighbor path — higher throughput at scale. Same query, zero configuration.
Use Cases#
RAG Pipeline#
ArcFlow's vector search integrates directly with the GraphRAG pipeline:
-- Index document embeddings
CREATE VECTOR INDEX doc_embeddings
FOR (d:Document)
ON (d.embedding)
OPTIONS {dimensions: 768, similarity: 'cosine'}
-- Ingest documents with embeddings
CREATE (d:Document {
name: 'Architecture Overview',
content: 'ArcFlow is a graph database...',
embedding: [0.12, -0.34, ...]
})
-- Run GraphRAG with vector-backed retrieval
CALL algo.graphRAG('How does the storage engine work?')Semantic Search#
-- Create embeddings from your ML pipeline, store directly in the graph
CREATE (n:Concept {
name: 'machine learning',
embedding: [0.45, 0.12, -0.33, ...]
})
-- Query with a new embedding
CALL algo.vectorSearch('concept_embeddings', [0.44, 0.13, -0.31, ...], 10)Recommendation Engine#
-- Users and items with embeddings in the same vector space
CREATE VECTOR INDEX user_item_vec
FOR (n:Entity)
ON (n.embedding)
OPTIONS {dimensions: 128, similarity: 'dotProduct'}
-- Find items closest to a user's embedding
CALL algo.vectorSearch('user_item_vec', [0.2, 0.8, -0.1, ...], 20)See Also#
- Graph Algorithms —
algo.node2vec(),algo.graphSAGE(),algo.similarNodes()for embedding generation - Trusted RAG — hybrid vector + graph retrieval with confidence scoring
- RAG Pipeline Guide — full GraphRAG pipeline implementation
- Algorithms Reference —
algo.vectorSearch()andalgo.hybridSearch()signatures