Trusted RAG

Every answer carries a confidence score, a provenance chain, and a classification of how the fact was derived — observed, inferred, or predicted. The ArcFlow Evidence Model makes every retrieved fact auditable by default.

The deep knowledge problem#

LLMs are trained on vast corpora. They know facts. What they struggle with is traversal — questions where the answer requires following a chain of relationships that isn't directly encoded in their weights.

Ask an LLM to find the causal path between two entities separated by several degrees of connection, and it will confabulate or give up. The knowledge is somewhere in the training data; the multi-hop reasoning to connect it isn't reliably retrievable. This is the structural gap that no amount of fine-tuning closes: LLMs are lookup tables, not traversal engines.

Graph databases close that gap. They don't approximate paths — they compute them. A six-hop traversal that produces the wrong answer in an LLM takes milliseconds in a graph and is exact. The question isn't whether your model knows about the entities. It's whether it can connect them.

ArcFlow's Trusted RAG is built on this insight: combine an LLM's language fluency with a graph engine's traversal capability, then add a layer the combination still needs — filtering by reliability, tracing every answer to its source, and distinguishing hard facts from statistical guesses.

Standard RAG retrieves text chunks and hopes the LLM gets it right. There is no way to filter by reliability, trace an answer to its source document, or distinguish hard facts from statistical guesses. ArcFlow's Trusted RAG scores every fact (0.0–1.0), records which sensor or extraction model produced it, and filters low-confidence facts before the LLM ever sees them. The result: auditable answers with a verifiable evidence trail.

-- Trusted GraphRAG: only facts with confidence >= 0.8 reach the LLM
CALL algo.graphRAGTrusted('warranty Widget Pro', 0.8)
  YIELD node, score, confidence, observationClass
  RETURN node, score, confidence, observationClass
-- Result: "2-year warranty (confidence: 0.95, class: observed, source: manual_v3.pdf)"

The problem with standard RAG#

User: "Which zones have active robots right now? Are they in hazard areas?"

Standard RAG:
  1. Vector search → find 10 nearest text chunks about zones and robots
  2. Stuff into prompt → hope the right chunk is current
  3. LLM generates answer → no way to verify

What can go wrong:
  - Returned chunks are from a stale snapshot (robot has moved)
  - Two chunks contradict each other (sensor disagreement, no resolution)
  - Answer is plausible but ungrounded (hallucination about Zone C)
  - No audit trail — which sensor said this? How confident?

Trusted RAG with ArcFlow#

User: "Which zones have active robots right now? Are they in hazard areas?"

ArcFlow Trusted RAG:
  1. GraphRAG → traverse Sensor-DETECTED-Robot-OCCUPIES-Zone
  2. Filter by sensor.reliability >= 0.9, detection.confidence >= 0.85
  3. Check observation class → only "observed" facts from live sensors
  4. Compute composite trust per zone (sensor reliability × detection confidence)
  5. Return answer WITH evidence chain

Result: "Lab Alpha — 1 active robot, hazard 0.2
         (Cam-LabA: confidence 0.94, method: vision, sensor reliability: 0.95
          composite trust: 0.893, tier-2-confirmed)"

How it works#

1. The world model is a knowledge graph#

Every physical entity is a node. Every observation is a relationship with evidence attached:

MERGE (z:Zone {id: 'lab-a', name: 'Lab Alpha', hazard: 0.2})
MERGE (r:Robot {id: 'r1', name: 'Atlas-01', battery: 87, status: 'active'})
MERGE (s:Sensor {id: 'cam-1', name: 'Cam-LabA', type: 'camera', reliability: 0.95})
 
MERGE (r)-[:OCCUPIES {entered_seq: 1}]->(z)
MERGE (s)-[:DETECTED {confidence: 0.94, method: 'vision'}]->(r)

The graph is not a flat index of text chunks — it is the live state of the world. Every robot knows its zone. Every detection knows its sensor. Every relationship carries the evidence used to form it.

2. Multi-hop traversal reveals causal chains#

-- Who detected what, via which sensor, with what confidence, in which zone?
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
RETURN s.name AS sensor, s.reliability AS sensor_trust,
       d.confidence AS detection_confidence, d.method AS method,
       r.name AS entity, r.battery AS battery,
       z.name AS zone, z.hazard AS zone_hazard
ORDER BY d.confidence DESC

A vector search cannot answer this. It returns chunks. GraphRAG traverses the actual causal chain: detection event → detected entity → physical location → hazard classification.

3. Filter by trust before the LLM sees it#

-- Only high-confidence detections in hazard-rated zones reach the LLM
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
WHERE s.reliability >= 0.9
  AND d.confidence >= 0.85
  AND z.hazard > 0.0
RETURN z.name AS zone, z.hazard AS hazard_level,
       r.name AS robot, r.battery AS battery,
       s.reliability * d.confidence AS composite_trust
ORDER BY composite_trust DESC

The composite_trust value — sensor reliability × detection confidence — is a fact-level trust signal. Facts below threshold are excluded before the context window is assembled.

4. Temporal awareness: what was true at a given checkpoint?#

The world model accumulates observations over time. Every state change is versioned. Query "what did we know at sequence 500?" using the fact's metadata:

-- Current detections with high composite trust
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
WHERE d.confidence >= 0.88
RETURN z.name, r.name, d.confidence, d.method
 
-- Filter to only the most recent detection per robot (by entered_seq)
MATCH (r:Robot)-[o:OCCUPIES]->(z:Zone)
WHERE o.entered_seq >= 2
RETURN r.name AS robot, z.name AS current_zone

See Temporal Queries for full time-travel and WAL-based sequence queries.

Observation classes#

Every node in ArcFlow carries an observation class — the epistemological status of the fact:

Class	Meaning	Trust Level	World model example
Observed	Directly witnessed by a sensor	Highest	"Cam-LabA detected Atlas-01 at position (18, 8.5)"
Inferred	Derived from reasoning over observed facts	Medium	"Atlas-01 is probably still in Lab Alpha (no exit event)"
Predicted	Model output, statistical projection	Lowest	"Based on task pattern, Atlas-01 will reach Zone C in ~4 minutes"

-- Only use hard sensor observations for safety-critical answers
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)
WHERE d.confidence >= 0.9
RETURN s.name, s.type, r.name, d.confidence, d.method
 
-- Allow inference for planning queries
MATCH (r:Robot)-[:OCCUPIES]->(z:Zone)
WHERE z.hazard > 0.1
RETURN r.name AS robot_in_hazard_zone, z.name AS zone, z.hazard AS hazard_level

Provenance trails#

Every relationship carries provenance metadata — which sensor, which method, what confidence score:

MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)
RETURN s.name,          -- "Cam-LabA"
       s.type,          -- "camera"
       s.reliability,   -- 0.95
       d.confidence,    -- 0.94
       d.method,        -- "vision"
       r.name           -- "Atlas-01"

When the LLM says "Atlas-01 is active in Lab Alpha," you can trace exactly which camera, which detection method, which confidence score, and which sequence number produced that claim.

Cross-sensor corroboration#

Multiple sensors observing the same entity increase composite trust:

-- Find entities detected by more than one sensor
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)
WITH r.name AS entity, count(s) AS sensor_count,
     avg(d.confidence) AS avg_confidence,
     avg(s.reliability) AS avg_sensor_reliability
WHERE sensor_count >= 2
RETURN entity, sensor_count, avg_confidence, avg_sensor_reliability
ORDER BY avg_confidence DESC

Single-sensor detections go to the LLM flagged as tier-3-uncertain. Multi-sensor corroboration with high average confidence becomes tier-1-authoritative.

Confidence-weighted retrieval#

Graph traversal respects confidence. Low-trust paths are deprioritized in the context assembly:

-- Tier world model facts by composite trust
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
WITH z.name AS zone,
     r.name AS robot,
     s.reliability * d.confidence AS composite_trust
RETURN zone, robot, composite_trust,
  CASE
    WHEN composite_trust >= 0.93 THEN 'tier-1-authoritative'
    WHEN composite_trust >= 0.85 THEN 'tier-2-confirmed'
    ELSE 'tier-3-uncertain'
  END AS retrieval_tier
ORDER BY composite_trust DESC

-- Graph algorithm: PageRank on the world model
-- High-PageRank entities are most connected — most observed, most influential
CALL algo.pageRank()
  YIELD nodeId, score
  RETURN nodeId, score ORDER BY score DESC LIMIT 10

The full pipeline#

-- 1. Seed the world model with sensor observations
MERGE (z:Zone {id: 'assembly', name: 'Assembly Bay', hazard: 0.3})
MERGE (r:Robot {id: 'bolt-01', name: 'BoltBot-01', task: 'assembly', battery: 72, status: 'active'})
MERGE (s:Sensor {id: 'therm-1', name: 'ThermalCam-Bay', type: 'thermal', reliability: 0.93})
 
MERGE (r)-[:OCCUPIES {entered_seq: 10}]->(z)
MERGE (s)-[:DETECTED {confidence: 0.91, method: 'thermal'}]->(r)
 
-- 2. GraphRAG: retrieve context with trust filtering
CALL algo.graphRAGTrusted('active robots in hazard zones', 0.85)
  YIELD node, score, confidence, observationClass
  RETURN node, score, confidence, observationClass
 
-- 3. Traverse the full causal chain: sensor -> robot -> zone
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
WHERE s.reliability >= 0.9 AND d.confidence >= 0.85
RETURN s.name, d.method, r.name, r.task, z.name, z.hazard,
       s.reliability * d.confidence AS composite_trust
ORDER BY composite_trust DESC
 
-- 4. Aggregate trust by zone for LLM context assembly
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
WITH z.name AS zone, count(r) AS robot_count,
     avg(s.reliability * d.confidence) AS zone_trust
WHERE zone_trust >= 0.85
RETURN zone, robot_count, zone_trust
ORDER BY zone_trust DESC

Compared to other approaches#

RAG / LLM orchestration frameworks?#

Your vector store retriever becomes algo.vectorSearch(). Your graph retriever becomes algo.graphRAG(). ArcFlow adds confidence filtering, provenance, and temporal awareness — your RAG pipeline distinguishes facts from guesses, traces every answer to its source, and detects stale information via sequence-based queries. Trade-off: RAG frameworks have a larger connector ecosystem for third-party vector stores; ArcFlow's advantage is the integrated graph+vector+temporal stack with zero network hops.

Dedicated vector databases?#

These are purpose-built for vector similarity. ArcFlow is a knowledge graph with vector search built in — you get similarity search AND graph traversal, temporal queries, confidence scoring, and relationship-aware retrieval in one in-process database. Trade-off: dedicated vector databases offer higher-throughput vector indexing at scale (millions of vectors); ArcFlow's strength is combining vector similarity with graph structure and provenance in a single query.

Traditional graph databases + GraphRAG?#

Traditional graph database GraphRAG typically requires a separate analytics product and has no confidence scoring, no observation classes, and no temporal awareness. ArcFlow's Trusted RAG is built in, with provenance on every edge. Trade-off: traditional graph databases have a mature distributed clustering story; ArcFlow is single-process with in-process performance.

Vision#

ArcFlow provides a trust-aware knowledge infrastructure where AI systems can reason about the reliability of their own knowledge.

Confidence calibration. Confidence scores set at ingestion time can be automatically calibrated — measuring how often detections at a given confidence level turn out to be accurate, then adjusting scores to match observed precision. A 0.9 confidence should mean "correct 90% of the time," verified empirically.

Automated fact verification. Cross-referencing new sensor observations against existing graph structure to detect contradictions before they enter the world model. When two sensors disagree on a robot's location, the system flags the conflict with both observations and their confidence scores, rather than silently overwriting.

Decay and freshness modeling. Observations lose reliability as time passes. A robot detection from 10 minutes ago is less trusted than one from 10 seconds ago. Automatic confidence decay based on observation age and entity volatility ensures that temporal staleness is reflected in retrieval ranking without manual curation.

Knowledge graph construction from unstructured data. Extending the skill pipeline to automatically extract entities, relationships, and confidence scores from logs, transcripts, and sensor streams — building the world model incrementally with full provenance from raw events to structured facts.

Trusted RAG

The deep knowledge problem#

-- Trusted GraphRAG: only facts with confidence >= 0.8 reach the LLM
CALL algo.graphRAGTrusted('warranty Widget Pro', 0.8)
  YIELD node, score, confidence, observationClass
  RETURN node, score, confidence, observationClass
-- Result: "2-year warranty (confidence: 0.95, class: observed, source: manual_v3.pdf)"

The problem with standard RAG#

User: "Which zones have active robots right now? Are they in hazard areas?"

Standard RAG:
  1. Vector search → find 10 nearest text chunks about zones and robots
  2. Stuff into prompt → hope the right chunk is current
  3. LLM generates answer → no way to verify

What can go wrong:
  - Returned chunks are from a stale snapshot (robot has moved)
  - Two chunks contradict each other (sensor disagreement, no resolution)
  - Answer is plausible but ungrounded (hallucination about Zone C)
  - No audit trail — which sensor said this? How confident?

Trusted RAG with ArcFlow#

User: "Which zones have active robots right now? Are they in hazard areas?"

ArcFlow Trusted RAG:
  1. GraphRAG → traverse Sensor-DETECTED-Robot-OCCUPIES-Zone
  2. Filter by sensor.reliability >= 0.9, detection.confidence >= 0.85
  3. Check observation class → only "observed" facts from live sensors
  4. Compute composite trust per zone (sensor reliability × detection confidence)
  5. Return answer WITH evidence chain

Result: "Lab Alpha — 1 active robot, hazard 0.2
         (Cam-LabA: confidence 0.94, method: vision, sensor reliability: 0.95
          composite trust: 0.893, tier-2-confirmed)"

How it works#

1. The world model is a knowledge graph#

Every physical entity is a node. Every observation is a relationship with evidence attached:

MERGE (z:Zone {id: 'lab-a', name: 'Lab Alpha', hazard: 0.2})
MERGE (r:Robot {id: 'r1', name: 'Atlas-01', battery: 87, status: 'active'})
MERGE (s:Sensor {id: 'cam-1', name: 'Cam-LabA', type: 'camera', reliability: 0.95})
 
MERGE (r)-[:OCCUPIES {entered_seq: 1}]->(z)
MERGE (s)-[:DETECTED {confidence: 0.94, method: 'vision'}]->(r)

2. Multi-hop traversal reveals causal chains#

-- Who detected what, via which sensor, with what confidence, in which zone?
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
RETURN s.name AS sensor, s.reliability AS sensor_trust,
       d.confidence AS detection_confidence, d.method AS method,
       r.name AS entity, r.battery AS battery,
       z.name AS zone, z.hazard AS zone_hazard
ORDER BY d.confidence DESC

A vector search cannot answer this. It returns chunks. GraphRAG traverses the actual causal chain: detection event → detected entity → physical location → hazard classification.

3. Filter by trust before the LLM sees it#

-- Only high-confidence detections in hazard-rated zones reach the LLM
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
WHERE s.reliability >= 0.9
  AND d.confidence >= 0.85
  AND z.hazard > 0.0
RETURN z.name AS zone, z.hazard AS hazard_level,
       r.name AS robot, r.battery AS battery,
       s.reliability * d.confidence AS composite_trust
ORDER BY composite_trust DESC

The composite_trust value — sensor reliability × detection confidence — is a fact-level trust signal. Facts below threshold are excluded before the context window is assembled.

4. Temporal awareness: what was true at a given checkpoint?#

The world model accumulates observations over time. Every state change is versioned. Query "what did we know at sequence 500?" using the fact's metadata:

-- Current detections with high composite trust
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
WHERE d.confidence >= 0.88
RETURN z.name, r.name, d.confidence, d.method
 
-- Filter to only the most recent detection per robot (by entered_seq)
MATCH (r:Robot)-[o:OCCUPIES]->(z:Zone)
WHERE o.entered_seq >= 2
RETURN r.name AS robot, z.name AS current_zone

See Temporal Queries for full time-travel and WAL-based sequence queries.

Observation classes#

Every node in ArcFlow carries an observation class — the epistemological status of the fact:

Class	Meaning	Trust Level	World model example
Observed	Directly witnessed by a sensor	Highest	"Cam-LabA detected Atlas-01 at position (18, 8.5)"
Inferred	Derived from reasoning over observed facts	Medium	"Atlas-01 is probably still in Lab Alpha (no exit event)"
Predicted	Model output, statistical projection	Lowest	"Based on task pattern, Atlas-01 will reach Zone C in ~4 minutes"

-- Only use hard sensor observations for safety-critical answers
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)
WHERE d.confidence >= 0.9
RETURN s.name, s.type, r.name, d.confidence, d.method
 
-- Allow inference for planning queries
MATCH (r:Robot)-[:OCCUPIES]->(z:Zone)
WHERE z.hazard > 0.1
RETURN r.name AS robot_in_hazard_zone, z.name AS zone, z.hazard AS hazard_level

Provenance trails#

Every relationship carries provenance metadata — which sensor, which method, what confidence score:

MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)
RETURN s.name,          -- "Cam-LabA"
       s.type,          -- "camera"
       s.reliability,   -- 0.95
       d.confidence,    -- 0.94
       d.method,        -- "vision"
       r.name           -- "Atlas-01"

When the LLM says "Atlas-01 is active in Lab Alpha," you can trace exactly which camera, which detection method, which confidence score, and which sequence number produced that claim.

Cross-sensor corroboration#

Multiple sensors observing the same entity increase composite trust:

-- Find entities detected by more than one sensor
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)
WITH r.name AS entity, count(s) AS sensor_count,
     avg(d.confidence) AS avg_confidence,
     avg(s.reliability) AS avg_sensor_reliability
WHERE sensor_count >= 2
RETURN entity, sensor_count, avg_confidence, avg_sensor_reliability
ORDER BY avg_confidence DESC

Single-sensor detections go to the LLM flagged as tier-3-uncertain. Multi-sensor corroboration with high average confidence becomes tier-1-authoritative.

Confidence-weighted retrieval#

Graph traversal respects confidence. Low-trust paths are deprioritized in the context assembly:

-- Tier world model facts by composite trust
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
WITH z.name AS zone,
     r.name AS robot,
     s.reliability * d.confidence AS composite_trust
RETURN zone, robot, composite_trust,
  CASE
    WHEN composite_trust >= 0.93 THEN 'tier-1-authoritative'
    WHEN composite_trust >= 0.85 THEN 'tier-2-confirmed'
    ELSE 'tier-3-uncertain'
  END AS retrieval_tier
ORDER BY composite_trust DESC

-- Graph algorithm: PageRank on the world model
-- High-PageRank entities are most connected — most observed, most influential
CALL algo.pageRank()
  YIELD nodeId, score
  RETURN nodeId, score ORDER BY score DESC LIMIT 10

The full pipeline#

-- 1. Seed the world model with sensor observations
MERGE (z:Zone {id: 'assembly', name: 'Assembly Bay', hazard: 0.3})
MERGE (r:Robot {id: 'bolt-01', name: 'BoltBot-01', task: 'assembly', battery: 72, status: 'active'})
MERGE (s:Sensor {id: 'therm-1', name: 'ThermalCam-Bay', type: 'thermal', reliability: 0.93})
 
MERGE (r)-[:OCCUPIES {entered_seq: 10}]->(z)
MERGE (s)-[:DETECTED {confidence: 0.91, method: 'thermal'}]->(r)
 
-- 2. GraphRAG: retrieve context with trust filtering
CALL algo.graphRAGTrusted('active robots in hazard zones', 0.85)
  YIELD node, score, confidence, observationClass
  RETURN node, score, confidence, observationClass
 
-- 3. Traverse the full causal chain: sensor -> robot -> zone
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
WHERE s.reliability >= 0.9 AND d.confidence >= 0.85
RETURN s.name, d.method, r.name, r.task, z.name, z.hazard,
       s.reliability * d.confidence AS composite_trust
ORDER BY composite_trust DESC
 
-- 4. Aggregate trust by zone for LLM context assembly
MATCH (s:Sensor)-[d:DETECTED]->(r:Robot)-[:OCCUPIES]->(z:Zone)
WITH z.name AS zone, count(r) AS robot_count,
     avg(s.reliability * d.confidence) AS zone_trust
WHERE zone_trust >= 0.85
RETURN zone, robot_count, zone_trust
ORDER BY zone_trust DESC

Compared to other approaches#

RAG / LLM orchestration frameworks?#

Dedicated vector databases?#

Traditional graph databases + GraphRAG?#

Vision#

ArcFlow provides a trust-aware knowledge infrastructure where AI systems can reason about the reliability of their own knowledge.

Trusted RAG

The deep knowledge problem#

The problem with standard RAG#

Trusted RAG with ArcFlow#

How it works#

1. The world model is a knowledge graph#

2. Multi-hop traversal reveals causal chains#

3. Filter by trust before the LLM sees it#

4. Temporal awareness: what was true at a given checkpoint?#

Observation classes#

Provenance trails#

Cross-sensor corroboration#

Confidence-weighted retrieval#

The full pipeline#

Compared to other approaches#

RAG / LLM orchestration frameworks?#

Dedicated vector databases?#

Traditional graph databases + GraphRAG?#

Vision#

See Also#

Trusted RAG

The deep knowledge problem#

The problem with standard RAG#

Trusted RAG with ArcFlow#

How it works#

1. The world model is a knowledge graph#

2. Multi-hop traversal reveals causal chains#

3. Filter by trust before the LLM sees it#

4. Temporal awareness: what was true at a given checkpoint?#

Observation classes#

Provenance trails#

Cross-sensor corroboration#

Confidence-weighted retrieval#

The full pipeline#

Compared to other approaches#

RAG / LLM orchestration frameworks?#

Dedicated vector databases?#

Traditional graph databases + GraphRAG?#

Vision#

See Also#