Filesystem Workspace
ArcFlow stores everything on the local filesystem. No database server. No daemon. No cloud account. You point it at a directory and it persists your graph as files you can inspect, back up, and version-control.
Quick Start: One Command#
Initialize a workspace in your project, and every command in that tree automatically persists:
cd my-project
arcflow workspace init
# => Initialized workspace at my-project/.arcflow
# Now every query auto-persists. No --data-dir flag needed.
arcflow query "CREATE (n:Person {name: 'Alice', age: 30})" --json
arcflow query "MATCH (n:Person) RETURN n.name, n.age" --json
# => {"rows":[{"name":"Alice","age":"30"}],"count":1}
# Works from subdirectories too — ArcFlow walks up to find .arcflow/
cd src
arcflow query "MATCH (n) RETURN count(n)" --json
# => {"rows":[{"count(n)":"1"}],"count":1}ArcFlow finds .arcflow/config.yaml by walking up the directory tree (like git finds .git/), reads the data_dir setting, and persists there automatically.
Three persistence modes#
| Mode | How | When to use | |--- ---|-----|---#
----|
| Auto-discover | arcflow workspace init once, then just arcflow query "..." | Projects. Recommended. |
| Explicit directory | arcflow query "..." --data-dir ./mydata | Scripts, CI/CD, one-off analysis |
| In-memory | arcflow query "..." (no init, no flag) | Experimentation. Nothing persists. |
--data-dir always takes priority over auto-discovery. If neither is set and no .arcflow/ is found, the graph is in-memory only.
How Persistence Works#
What gets created#
When you use --data-dir, ArcFlow creates two files in that directory:
mydata/
├── worldcypher.snapshot.json # Full graph state (human-readable JSON)
└── worldcypher.wal # Write-ahead log (binary, crash recovery)
worldcypher.snapshot.json -- The full graph serialized as JSON. Created after the first mutation. Updated every 50 mutations, on :checkpoint, and on REPL exit. You can inspect it directly:
{
"nodes": [
{
"id": 1,
"labels": ["Person"],
"properties": {"name": {"String": "Alice"}, "age": {"Int": 30}},
"confidence": 1.0,
"observation_class": "Observed"
}
],
"relationships": [
{"id": 0, "start": 1, "end": 2, "rel_type": "KNOWS", "properties": {}}
]
}worldcypher.wal -- Binary write-ahead log with CRC32 checksums. Provides crash recovery between snapshots. You never read this directly.
Startup sequence#
When ArcFlow opens a data directory:
- Load
worldcypher.snapshot.jsonif it exists - Replay any WAL entries on top of the snapshot
- Graph is ready
When the snapshot is deleted, the graph starts empty. This is how you reset.
What workspace init Creates#
my-project/
└── .arcflow/
├── config.yaml # Engine configuration
├── data/ # Graph data (snapshot + WAL)
└── state/ # Internal state tracking
The generated config.yaml:
# ArcFlow workspace configuration
version: "0.1.0"
backend: cpu
data_dir: .arcflow/dataThe data_dir setting tells ArcFlow where to persist. When you run any arcflow command anywhere inside the project tree, ArcFlow walks up from the current directory looking for .arcflow/config.yaml (like git finds .git/), reads data_dir, and persists there.
Version control: Add .arcflow/data/ to .gitignore. Keep .arcflow/config.yaml tracked so collaborators and agents auto-discover the workspace.
Persistence Modes in Detail#
| Mode | Setup | Persistence | Use case | |---
| --- | --- |
|---|
----|---
-------|
| Workspace (recommended) | arcflow workspace init once | Auto-discovers .arcflow/ from any subdirectory | Projects, teams, agents |
| Explicit directory | --data-dir ./mydata on every command | Saves to that directory | Scripts, CI/CD |
| Interactive REPL | arcflow (auto-discovers) or arcflow --data-dir ./mydata | On exit + every 50 writes | Exploration, debugging |
| In-memory | No init, no flag, no .arcflow/ in tree | None | One-off queries |
The REPL supports additional persistence commands:
| Command | Description | |--- ------|---#
----|
| :checkpoint | Force save snapshot + clear WAL now |
| :snapshot path.json | Export graph to a specific file |
| :restore path.json | Import graph from a file |
| :export json path.json | Export as JSON |
| :export graphml path.xml | Export as GraphML (for Gephi, yEd) |
Connecting Claude Code#
Claude Code can interact with ArcFlow in four ways, from fastest to most compatible. Methods 1 and 1+ are the zero-friction ladder — bash and Unix tools work natively over typed memory. Methods 2 and 3 exist for environments without local shell access.
Method 1: CLI commands (fastest)#
Claude Code already has Bash access. Once you arcflow workspace init in your project, Claude Code can run queries with no extra flags:
# Create data (auto-discovers .arcflow/ in project tree)
arcflow query "CREATE (f:File {path: 'src/lib.rs', loc: 500})" --json
# Query data
arcflow query "MATCH (n:File) RETURN n.path, n.loc ORDER BY n.loc DESC" --json
# Run algorithms
arcflow query "CALL algo.pageRank()" --jsonEvery command returns structured JSON:
{"rows":[{"path":"src/lib.rs","loc":"500"}],"count":1}Claude Code reads the JSON directly. No tool calls. No protocol overhead.
Discover the graph before querying:
# What labels and relationship types exist?
arcflow query "CALL db.schema()" --json
# Full engine context (capabilities, algorithms, observation classes)
arcflow agent-context synth --jsonQuery with parameters (safe from injection):
arcflow query "MATCH (n:Person {name: \$name}) RETURN n" --param name=Alice --data-dir .arcflow/data --jsonMethod 1+: Filesystem mount (the world model as files)#
Target end-state. The mount surface (
arcflow mount) ships with AFP-0003 — substrate-pending. Today's filesystem-as-perception substrate ships at AFP-0001 / AFP-0002 (arcflow workspace init, the typed in-process API). The contract described below — read-only filesystem reads, typed writes — is doctrine today; thearcflow mountinvocation lands when AFP-0003 cuts.
arcflow mount projects the workspace as a read-only filesystem tree. Every label is a directory; every node is a JSON file; every snapshot is a path. The agent reads with cat, navigates with ls, searches with grep or rg — no protocol, no tokens, no round-trip. Writes still go through the typed API (mount is read-only by design); discovery happens at filesystem speed.
# Mount the workspace at a path of your choice
arcflow mount ~/work/my-project/.arcflow ./world-fs
# Now the world model is browsable as files
ls ./world-fs/
# __snapshot.toml nodes/ edges/ labels/ streams/
# Read a single node
cat ./world-fs/nodes/Person/p1.json
# {"id":"p1","name":"Alice","age":30,"_confidence":0.97,"_observation_class":"observed"}
# Discover all labels in the world
ls ./world-fs/labels/
# Person Org Fact Frame Sensor Zone Robot
# Find every File node with loc > 500 — no Cypher, just bash
find ./world-fs/nodes/File -name '*.json' \
| xargs -I{} jq 'select(.loc > 500) | .path' {}
# Live tail a standing query's deltas as a path
tail -F ./world-fs/streams/fraud_threshold.jsonlWhy this matters for agents. LLM coding agents are extensively pre-trained on ls, find, grep, cat, jq — Unix tooling examples saturate the training distribution. Mounting the world model as a filesystem lets the agent apply mastery it already has, instead of learning a new query API. The same agent that writes find . -name '*.ts' | xargs grep TODO can write find ./world-fs/nodes/File -name '*.json' | xargs jq against the typed graph.
Layout (per the workspace projection contract):
./world-fs/
├── __snapshot.toml # Snapshot ID + provenance for this projection
├── nodes/<Label>/<id>.json # One file per node, typed properties + confidence + observation class
├── edges/<RelType>/<id>.json # One file per relationship
├── labels/<Label>/ # Listing of every node carrying the label
├── streams/<view-name>.jsonl # Tail-able delta stream for each LIVE VIEW
└── _row_count.txt # Quick counts per label (no scan required)
The bright line: filesystem reads, typed writes. The mount surface never accepts a write. Mutations always go through arcflow query (Method 1) or the FFI bindings. This keeps the typed-entity invariants intact while letting agents read at filesystem speed.
Method 2: MCP server (cloud chat interfaces only)#
Claude Code, Cursor, Codex CLI: Method 1 (CLI binary) is faster — no protocol overhead, no config. Use MCP only if you're accessing ArcFlow from a cloud chat interface that has no local shell.
For cloud chat UIs (Claude.ai and similar) — interfaces with no local filesystem access — connect via the MCP server:
{
"mcpServers": {
"arcflow": {
"command": "arcflow-mcp",
"args": ["--data-dir", ".arcflow/data"]
}
}
}Exposes 8 tools including get_schema, read_query, write_query, and graph_rag. The read_query tool rejects mutations; write_query rejects reads — read/write safety enforced by the server.
Method 3: HTTP API (remote access)#
Start ArcFlow as an HTTP server:
arcflow --http 8080 --data-dir .arcflow/data --api-key my-secret-keyThen query from any HTTP client:
curl -X POST http://localhost:8080/query \
-H "Authorization: Bearer my-secret-key" \
-d "MATCH (n:Person) RETURN n.name"Endpoints:
| Method | Path | Description | |---
| ----- | --- |
|---|
----|
| GET | /health | Liveness probe |
| GET | /ready | Readiness with node/rel counts |
| GET | /status | Full engine status |
| POST | /query | Execute WorldCypher |
| GET | /query?q=MATCH... | Execute from URL parameter |
Practical Example: Codebase World Model#
Here's how Claude Code would build and query a world model of your codebase:
Step 1: Initialize#
arcflow workspace initStep 2: Build the graph#
# Create crate nodes (no --data-dir needed — .arcflow/ auto-discovered)
arcflow query "CREATE (c:Crate {name: 'my-core', loc: 5500, tests: 120})" --json
arcflow query "CREATE (c:Crate {name: 'my-runtime', loc: 17000, tests: 467})" --json
arcflow query "CREATE (c:Crate {name: 'my-storage', loc: 2200, tests: 32})" --json
# Create dependency relationships
arcflow query "MATCH (a:Crate {name: 'my-runtime'}), (b:Crate {name: 'my-core'}) CREATE (a)-[:DEPENDS_ON]->(b)" --json
arcflow query "MATCH (a:Crate {name: 'my-runtime'}), (b:Crate {name: 'my-storage'}) CREATE (a)-[:DEPENDS_ON]->(b)" --jsonStep 3: Query and analyze#
# Which crate has the most code?
arcflow query "MATCH (c:Crate) RETURN c.name, c.loc ORDER BY c.loc DESC" --json
# What depends on my-core?
arcflow query "MATCH (a:Crate)-[:DEPENDS_ON]->(b:Crate {name: 'my-core'}) RETURN a.name" --json
# PageRank — which crate is most central?
arcflow query "CALL algo.pageRank()" --jsonStep 4: Persist across sessions#
The graph survives CLI restarts. Next time Claude Code opens your project, the world model is already there:
arcflow query "MATCH (c:Crate) RETURN count(c)" --json
# => {"rows":[{"count(c)":"3"}],"count":1}REPL Commands Reference#
Start the REPL with arcflow --data-dir .arcflow/data. Available commands:
| Command | Description | |--- ------|---#
----|
| :help | Full command reference |
| :status | Engine status, query cache hit rate |
| :count | Node/relationship/skill counts |
| :schema | Full database schema (labels, properties) |
| :labels | All node labels |
| :types | All relationship types |
| :indexes | List indexes |
| :dump | Export all nodes as CREATE statements |
| :snapshot path.json | Export graph to file |
| :restore path.json | Import graph from file |
| :export json path.json | Export to JSON |
| :export graphml path.xml | Export to GraphML |
| :import csv file.csv Label | Bulk import CSV |
| :checkpoint | Force save snapshot + clear WAL |
| :clear | Delete all data |
| :demo | Load sample data (30 nodes) |
Diagnostic Commands#
# Where does ArcFlow store data?
arcflow paths --json
# Is the workspace healthy?
arcflow doctor --json
# What can the engine do? (for AI agents)
arcflow agent-context synth --jsonMulti-Agent Workspace#
Multiple agents can share the same ArcFlow workspace. Each reads and writes to the same graph through the filesystem:
project/
└── .arcflow/
└── data/
└── worldcypher.snapshot.json # Shared graph state
Agent A creates nodes. Agent B queries them. Agent C runs algorithms. The snapshot file is the coordination point. No message broker. No orchestration framework.
For concurrent writes, start the HTTP server and have agents query through it -- the server handles serialization:
arcflow --http 8080 --data-dir .arcflow/dataTips#
- Reset the graph: Delete
worldcypher.snapshot.jsonandworldcypher.wal. Next query starts fresh. - Back up the graph: Copy
worldcypher.snapshot.json. It's a self-contained JSON file. - Version control: Add
.arcflow/config.yamlto git. Add.arcflow/data/to.gitignore. - Inspect the data:
cat .arcflow/data/worldcypher.snapshot.json | python3 -m json.tool - Export for other tools: Use
:export graphml graph.xmlfor Gephi, yEd, or Cytoscape.
See Also#
- Agent-Native Database — filesystem workspace architecture and multi-agent coordination
- CLI — full command reference, flags, watch mode, and batch execution
- ArcFlow for Coding Agents — structured errors, dry-run validation, and schema discovery
- Swarm & Multi-Agent — sharing a workspace across multiple agents