Filesystem Workspace

ArcFlow stores everything on the local filesystem. No database server. No daemon. No cloud account. You point it at a directory and it persists your graph as files you can inspect, back up, and version-control.

Quick Start: One Command#

Initialize a workspace in your project, and every command in that tree automatically persists:

cd my-project
arcflow workspace init
# => Initialized workspace at my-project/.arcflow
 
# Now every query auto-persists. No --data-dir flag needed.
arcflow query "CREATE (n:Person {name: 'Alice', age: 30})" --json
arcflow query "MATCH (n:Person) RETURN n.name, n.age" --json
# => {"rows":[{"name":"Alice","age":"30"}],"count":1}
 
# Works from subdirectories too — ArcFlow walks up to find .arcflow/
cd src
arcflow query "MATCH (n) RETURN count(n)" --json
# => {"rows":[{"count(n)":"1"}],"count":1}

ArcFlow finds .arcflow/config.yaml by walking up the directory tree (like git finds .git/), reads the data_dir setting, and persists there automatically.

Three persistence modes#

| Mode | How | When to use | |--- ---|-----|---#

----| | Auto-discover | arcflow workspace init once, then just arcflow query "..." | Projects. Recommended. | | Explicit directory | arcflow query "..." --data-dir ./mydata | Scripts, CI/CD, one-off analysis | | In-memory | arcflow query "..." (no init, no flag) | Experimentation. Nothing persists. |

--data-dir always takes priority over auto-discovery. If neither is set and no .arcflow/ is found, the graph is in-memory only.

How Persistence Works#

What gets created#

When you use --data-dir, ArcFlow creates two files in that directory:

mydata/
├── worldcypher.snapshot.json    # Full graph state (human-readable JSON)
└── worldcypher.wal              # Write-ahead log (binary, crash recovery)

worldcypher.snapshot.json -- The full graph serialized as JSON. Created after the first mutation. Updated every 50 mutations, on :checkpoint, and on REPL exit. You can inspect it directly:

{
  "nodes": [
    {
      "id": 1,
      "labels": ["Person"],
      "properties": {"name": {"String": "Alice"}, "age": {"Int": 30}},
      "confidence": 1.0,
      "observation_class": "Observed"
    }
  ],
  "relationships": [
    {"id": 0, "start": 1, "end": 2, "rel_type": "KNOWS", "properties": {}}
  ]
}

worldcypher.wal -- Binary write-ahead log with CRC32 checksums. Provides crash recovery between snapshots. You never read this directly.

Startup sequence#

When ArcFlow opens a data directory:

Load worldcypher.snapshot.json if it exists
Replay any WAL entries on top of the snapshot
Graph is ready

When the snapshot is deleted, the graph starts empty. This is how you reset.

What `workspace init` Creates#

my-project/
└── .arcflow/
    ├── config.yaml              # Engine configuration
    ├── data/                    # Graph data (snapshot + WAL)
    └── state/                   # Internal state tracking

The generated config.yaml:

# ArcFlow workspace configuration
version: "0.1.0"
backend: cpu
data_dir: .arcflow/data

The data_dir setting tells ArcFlow where to persist. When you run any arcflow command anywhere inside the project tree, ArcFlow walks up from the current directory looking for .arcflow/config.yaml (like git finds .git/), reads data_dir, and persists there.

Version control: Add .arcflow/data/ to .gitignore. Keep .arcflow/config.yaml tracked so collaborators and agents auto-discover the workspace.

Persistence Modes in Detail#

---	---

----|--- -------| | Workspace (recommended) | arcflow workspace init once | Auto-discovers .arcflow/ from any subdirectory | Projects, teams, agents | | Explicit directory | --data-dir ./mydata on every command | Saves to that directory | Scripts, CI/CD | | Interactive REPL | arcflow (auto-discovers) or arcflow --data-dir ./mydata | On exit + every 50 writes | Exploration, debugging | | In-memory | No init, no flag, no .arcflow/ in tree | None | One-off queries |

The REPL supports additional persistence commands:

| Command | Description | |--- ------|---#

----| | :checkpoint | Force save snapshot + clear WAL now | | :snapshot path.json | Export graph to a specific file | | :restore path.json | Import graph from a file | | :export json path.json | Export as JSON | | :export graphml path.xml | Export as GraphML (for Gephi, yEd) |

Connecting Claude Code#

Claude Code can interact with ArcFlow in four ways, from fastest to most compatible. Methods 1 and 1+ are the zero-friction ladder — bash and Unix tools work natively over typed memory. Methods 2 and 3 exist for environments without local shell access.

Method 1: CLI commands (fastest)#

Claude Code already has Bash access. Once you arcflow workspace init in your project, Claude Code can run queries with no extra flags:

# Create data (auto-discovers .arcflow/ in project tree)
arcflow query "CREATE (f:File {path: 'src/lib.rs', loc: 500})" --json
 
# Query data
arcflow query "MATCH (n:File) RETURN n.path, n.loc ORDER BY n.loc DESC" --json
 
# Run algorithms
arcflow query "CALL algo.pageRank()" --json

Every command returns structured JSON:

{"rows":[{"path":"src/lib.rs","loc":"500"}],"count":1}

Claude Code reads the JSON directly. No tool calls. No protocol overhead.

Discover the graph before querying:

# What labels and relationship types exist?
arcflow query "CALL db.schema()" --json
 
# Full engine context (capabilities, algorithms, observation classes)
arcflow agent-context synth --json

Query with parameters (safe from injection):

arcflow query "MATCH (n:Person {name: \$name}) RETURN n" --param name=Alice --data-dir .arcflow/data --json

Method 1+: Filesystem mount (the world model as files)#

Target end-state. The mount surface (arcflow mount) ships with AFP-0003 — substrate-pending. Today's filesystem-as-perception substrate ships at AFP-0001 / AFP-0002 (arcflow workspace init, the typed in-process API). The contract described below — read-only filesystem reads, typed writes — is doctrine today; the arcflow mount invocation lands when AFP-0003 cuts.

arcflow mount projects the workspace as a read-only filesystem tree. Every label is a directory; every node is a JSON file; every snapshot is a path. The agent reads with cat, navigates with ls, searches with grep or rg — no protocol, no tokens, no round-trip. Writes still go through the typed API (mount is read-only by design); discovery happens at filesystem speed.

# Mount the workspace at a path of your choice
arcflow mount ~/work/my-project/.arcflow ./world-fs
 
# Now the world model is browsable as files
ls ./world-fs/
# __snapshot.toml  nodes/  edges/  labels/  streams/
 
# Read a single node
cat ./world-fs/nodes/Person/p1.json
# {"id":"p1","name":"Alice","age":30,"_confidence":0.97,"_observation_class":"observed"}
 
# Discover all labels in the world
ls ./world-fs/labels/
# Person  Org  Fact  Frame  Sensor  Zone  Robot
 
# Find every File node with loc > 500 — no Cypher, just bash
find ./world-fs/nodes/File -name '*.json' \
  | xargs -I{} jq 'select(.loc > 500) | .path' {}
 
# Live tail a standing query's deltas as a path
tail -F ./world-fs/streams/fraud_threshold.jsonl

Why this matters for agents. LLM coding agents are extensively pre-trained on ls, find, grep, cat, jq — Unix tooling examples saturate the training distribution. Mounting the world model as a filesystem lets the agent apply mastery it already has, instead of learning a new query API. The same agent that writes find . -name '*.ts' | xargs grep TODO can write find ./world-fs/nodes/File -name '*.json' | xargs jq against the typed graph.

Layout (per the workspace projection contract):

./world-fs/
├── __snapshot.toml              # Snapshot ID + provenance for this projection
├── nodes/<Label>/<id>.json      # One file per node, typed properties + confidence + observation class
├── edges/<RelType>/<id>.json    # One file per relationship
├── labels/<Label>/              # Listing of every node carrying the label
├── streams/<view-name>.jsonl    # Tail-able delta stream for each LIVE VIEW
└── _row_count.txt               # Quick counts per label (no scan required)

The bright line: filesystem reads, typed writes. The mount surface never accepts a write. Mutations always go through arcflow query (Method 1) or the FFI bindings. This keeps the typed-entity invariants intact while letting agents read at filesystem speed.

Method 2: MCP server (cloud chat interfaces only)#

Claude Code, Cursor, Codex CLI: Method 1 (CLI binary) is faster — no protocol overhead, no config. Use MCP only if you're accessing ArcFlow from a cloud chat interface that has no local shell.

For cloud chat UIs (Claude.ai and similar) — interfaces with no local filesystem access — connect via the MCP server:

{
  "mcpServers": {
    "arcflow": {
      "command": "arcflow-mcp",
      "args": ["--data-dir", ".arcflow/data"]
    }
  }
}

Exposes 8 tools including get_schema, read_query, write_query, and graph_rag. The read_query tool rejects mutations; write_query rejects reads — read/write safety enforced by the server.

Method 3: HTTP API (remote access)#

Start ArcFlow as an HTTP server:

arcflow --http 8080 --data-dir .arcflow/data --api-key my-secret-key

Then query from any HTTP client:

curl -X POST http://localhost:8080/query \
  -H "Authorization: Bearer my-secret-key" \
  -d "MATCH (n:Person) RETURN n.name"

Endpoints:

-----	---

----| | GET | /health | Liveness probe | | GET | /ready | Readiness with node/rel counts | | GET | /status | Full engine status | | POST | /query | Execute WorldCypher | | GET | /query?q=MATCH... | Execute from URL parameter |

Practical Example: Codebase World Model#

Here's how Claude Code would build and query a world model of your codebase:

Step 1: Initialize#

arcflow workspace init

Step 2: Build the graph#

# Create crate nodes (no --data-dir needed — .arcflow/ auto-discovered)
arcflow query "CREATE (c:Crate {name: 'my-core', loc: 5500, tests: 120})" --json
arcflow query "CREATE (c:Crate {name: 'my-runtime', loc: 17000, tests: 467})" --json
arcflow query "CREATE (c:Crate {name: 'my-storage', loc: 2200, tests: 32})" --json
 
# Create dependency relationships
arcflow query "MATCH (a:Crate {name: 'my-runtime'}), (b:Crate {name: 'my-core'}) CREATE (a)-[:DEPENDS_ON]->(b)" --json
arcflow query "MATCH (a:Crate {name: 'my-runtime'}), (b:Crate {name: 'my-storage'}) CREATE (a)-[:DEPENDS_ON]->(b)" --json

Step 3: Query and analyze#

# Which crate has the most code?
arcflow query "MATCH (c:Crate) RETURN c.name, c.loc ORDER BY c.loc DESC" --json
 
# What depends on my-core?
arcflow query "MATCH (a:Crate)-[:DEPENDS_ON]->(b:Crate {name: 'my-core'}) RETURN a.name" --json
 
# PageRank — which crate is most central?
arcflow query "CALL algo.pageRank()" --json

Step 4: Persist across sessions#

The graph survives CLI restarts. Next time Claude Code opens your project, the world model is already there:

arcflow query "MATCH (c:Crate) RETURN count(c)" --json
# => {"rows":[{"count(c)":"3"}],"count":1}

REPL Commands Reference#

Start the REPL with arcflow --data-dir .arcflow/data. Available commands:

| Command | Description | |--- ------|---#

----| | :help | Full command reference | | :status | Engine status, query cache hit rate | | :count | Node/relationship/skill counts | | :schema | Full database schema (labels, properties) | | :labels | All node labels | | :types | All relationship types | | :indexes | List indexes | | :dump | Export all nodes as CREATE statements | | :snapshot path.json | Export graph to file | | :restore path.json | Import graph from file | | :export json path.json | Export to JSON | | :export graphml path.xml | Export to GraphML | | :import csv file.csv Label | Bulk import CSV | | :checkpoint | Force save snapshot + clear WAL | | :clear | Delete all data | | :demo | Load sample data (30 nodes) |

Diagnostic Commands#

# Where does ArcFlow store data?
arcflow paths --json
 
# Is the workspace healthy?
arcflow doctor --json
 
# What can the engine do? (for AI agents)
arcflow agent-context synth --json

Multi-Agent Workspace#

Multiple agents can share the same ArcFlow workspace. Each reads and writes to the same graph through the filesystem:

project/
└── .arcflow/
    └── data/
        └── worldcypher.snapshot.json   # Shared graph state

Agent A creates nodes. Agent B queries them. Agent C runs algorithms. The snapshot file is the coordination point. No message broker. No orchestration framework.

For concurrent writes, start the HTTP server and have agents query through it -- the server handles serialization:

arcflow --http 8080 --data-dir .arcflow/data

Tips#

Reset the graph: Delete worldcypher.snapshot.json and worldcypher.wal. Next query starts fresh.
Back up the graph: Copy worldcypher.snapshot.json. It's a self-contained JSON file.
Version control: Add .arcflow/config.yaml to git. Add .arcflow/data/ to .gitignore.
Inspect the data: cat .arcflow/data/worldcypher.snapshot.json | python3 -m json.tool
Export for other tools: Use :export graphml graph.xml for Gephi, yEd, or Cytoscape.

Filesystem Workspace

Quick Start: One Command#

Initialize a workspace in your project, and every command in that tree automatically persists:

cd my-project
arcflow workspace init
# => Initialized workspace at my-project/.arcflow
 
# Now every query auto-persists. No --data-dir flag needed.
arcflow query "CREATE (n:Person {name: 'Alice', age: 30})" --json
arcflow query "MATCH (n:Person) RETURN n.name, n.age" --json
# => {"rows":[{"name":"Alice","age":"30"}],"count":1}
 
# Works from subdirectories too — ArcFlow walks up to find .arcflow/
cd src
arcflow query "MATCH (n) RETURN count(n)" --json
# => {"rows":[{"count(n)":"1"}],"count":1}

ArcFlow finds .arcflow/config.yaml by walking up the directory tree (like git finds .git/), reads the data_dir setting, and persists there automatically.

Three persistence modes#

| Mode | How | When to use | |--- ---|-----|---#

--data-dir always takes priority over auto-discovery. If neither is set and no .arcflow/ is found, the graph is in-memory only.

How Persistence Works#

What gets created#

When you use --data-dir, ArcFlow creates two files in that directory:

mydata/
├── worldcypher.snapshot.json    # Full graph state (human-readable JSON)
└── worldcypher.wal              # Write-ahead log (binary, crash recovery)

worldcypher.snapshot.json -- The full graph serialized as JSON. Created after the first mutation. Updated every 50 mutations, on :checkpoint, and on REPL exit. You can inspect it directly:

{
  "nodes": [
    {
      "id": 1,
      "labels": ["Person"],
      "properties": {"name": {"String": "Alice"}, "age": {"Int": 30}},
      "confidence": 1.0,
      "observation_class": "Observed"
    }
  ],
  "relationships": [
    {"id": 0, "start": 1, "end": 2, "rel_type": "KNOWS", "properties": {}}
  ]
}

worldcypher.wal -- Binary write-ahead log with CRC32 checksums. Provides crash recovery between snapshots. You never read this directly.

Startup sequence#

When ArcFlow opens a data directory:

Load worldcypher.snapshot.json if it exists
Replay any WAL entries on top of the snapshot
Graph is ready

When the snapshot is deleted, the graph starts empty. This is how you reset.

What `workspace init` Creates#

my-project/
└── .arcflow/
    ├── config.yaml              # Engine configuration
    ├── data/                    # Graph data (snapshot + WAL)
    └── state/                   # Internal state tracking

The generated config.yaml:

# ArcFlow workspace configuration
version: "0.1.0"
backend: cpu
data_dir: .arcflow/data

Version control: Add .arcflow/data/ to .gitignore. Keep .arcflow/config.yaml tracked so collaborators and agents auto-discover the workspace.

Persistence Modes in Detail#

---	---

The REPL supports additional persistence commands:

| Command | Description | |--- ------|---#

Connecting Claude Code#

Method 1: CLI commands (fastest)#

Claude Code already has Bash access. Once you arcflow workspace init in your project, Claude Code can run queries with no extra flags:

# Create data (auto-discovers .arcflow/ in project tree)
arcflow query "CREATE (f:File {path: 'src/lib.rs', loc: 500})" --json
 
# Query data
arcflow query "MATCH (n:File) RETURN n.path, n.loc ORDER BY n.loc DESC" --json
 
# Run algorithms
arcflow query "CALL algo.pageRank()" --json

Every command returns structured JSON:

{"rows":[{"path":"src/lib.rs","loc":"500"}],"count":1}

Claude Code reads the JSON directly. No tool calls. No protocol overhead.

Discover the graph before querying:

# What labels and relationship types exist?
arcflow query "CALL db.schema()" --json
 
# Full engine context (capabilities, algorithms, observation classes)
arcflow agent-context synth --json

Query with parameters (safe from injection):

arcflow query "MATCH (n:Person {name: \$name}) RETURN n" --param name=Alice --data-dir .arcflow/data --json

Method 1+: Filesystem mount (the world model as files)#

Target end-state. The mount surface (arcflow mount) ships with AFP-0003 — substrate-pending. Today's filesystem-as-perception substrate ships at AFP-0001 / AFP-0002 (arcflow workspace init, the typed in-process API). The contract described below — read-only filesystem reads, typed writes — is doctrine today; the arcflow mount invocation lands when AFP-0003 cuts.

# Mount the workspace at a path of your choice
arcflow mount ~/work/my-project/.arcflow ./world-fs
 
# Now the world model is browsable as files
ls ./world-fs/
# __snapshot.toml  nodes/  edges/  labels/  streams/
 
# Read a single node
cat ./world-fs/nodes/Person/p1.json
# {"id":"p1","name":"Alice","age":30,"_confidence":0.97,"_observation_class":"observed"}
 
# Discover all labels in the world
ls ./world-fs/labels/
# Person  Org  Fact  Frame  Sensor  Zone  Robot
 
# Find every File node with loc > 500 — no Cypher, just bash
find ./world-fs/nodes/File -name '*.json' \
  | xargs -I{} jq 'select(.loc > 500) | .path' {}
 
# Live tail a standing query's deltas as a path
tail -F ./world-fs/streams/fraud_threshold.jsonl

Layout (per the workspace projection contract):

./world-fs/
├── __snapshot.toml              # Snapshot ID + provenance for this projection
├── nodes/<Label>/<id>.json      # One file per node, typed properties + confidence + observation class
├── edges/<RelType>/<id>.json    # One file per relationship
├── labels/<Label>/              # Listing of every node carrying the label
├── streams/<view-name>.jsonl    # Tail-able delta stream for each LIVE VIEW
└── _row_count.txt               # Quick counts per label (no scan required)

Method 2: MCP server (cloud chat interfaces only)#

Claude Code, Cursor, Codex CLI: Method 1 (CLI binary) is faster — no protocol overhead, no config. Use MCP only if you're accessing ArcFlow from a cloud chat interface that has no local shell.

For cloud chat UIs (Claude.ai and similar) — interfaces with no local filesystem access — connect via the MCP server:

{
  "mcpServers": {
    "arcflow": {
      "command": "arcflow-mcp",
      "args": ["--data-dir", ".arcflow/data"]
    }
  }
}

Method 3: HTTP API (remote access)#

Start ArcFlow as an HTTP server:

arcflow --http 8080 --data-dir .arcflow/data --api-key my-secret-key

Then query from any HTTP client:

curl -X POST http://localhost:8080/query \
  -H "Authorization: Bearer my-secret-key" \
  -d "MATCH (n:Person) RETURN n.name"

Endpoints:

-----	---

Practical Example: Codebase World Model#

Here's how Claude Code would build and query a world model of your codebase:

Step 1: Initialize#

arcflow workspace init

Step 2: Build the graph#

# Create crate nodes (no --data-dir needed — .arcflow/ auto-discovered)
arcflow query "CREATE (c:Crate {name: 'my-core', loc: 5500, tests: 120})" --json
arcflow query "CREATE (c:Crate {name: 'my-runtime', loc: 17000, tests: 467})" --json
arcflow query "CREATE (c:Crate {name: 'my-storage', loc: 2200, tests: 32})" --json
 
# Create dependency relationships
arcflow query "MATCH (a:Crate {name: 'my-runtime'}), (b:Crate {name: 'my-core'}) CREATE (a)-[:DEPENDS_ON]->(b)" --json
arcflow query "MATCH (a:Crate {name: 'my-runtime'}), (b:Crate {name: 'my-storage'}) CREATE (a)-[:DEPENDS_ON]->(b)" --json

Step 3: Query and analyze#

# Which crate has the most code?
arcflow query "MATCH (c:Crate) RETURN c.name, c.loc ORDER BY c.loc DESC" --json
 
# What depends on my-core?
arcflow query "MATCH (a:Crate)-[:DEPENDS_ON]->(b:Crate {name: 'my-core'}) RETURN a.name" --json
 
# PageRank — which crate is most central?
arcflow query "CALL algo.pageRank()" --json

Step 4: Persist across sessions#

The graph survives CLI restarts. Next time Claude Code opens your project, the world model is already there:

arcflow query "MATCH (c:Crate) RETURN count(c)" --json
# => {"rows":[{"count(c)":"3"}],"count":1}

REPL Commands Reference#

Start the REPL with arcflow --data-dir .arcflow/data. Available commands:

| Command | Description | |--- ------|---#

Diagnostic Commands#

# Where does ArcFlow store data?
arcflow paths --json
 
# Is the workspace healthy?
arcflow doctor --json
 
# What can the engine do? (for AI agents)
arcflow agent-context synth --json

Multi-Agent Workspace#

Multiple agents can share the same ArcFlow workspace. Each reads and writes to the same graph through the filesystem:

project/
└── .arcflow/
    └── data/
        └── worldcypher.snapshot.json   # Shared graph state

Agent A creates nodes. Agent B queries them. Agent C runs algorithms. The snapshot file is the coordination point. No message broker. No orchestration framework.

For concurrent writes, start the HTTP server and have agents query through it -- the server handles serialization:

arcflow --http 8080 --data-dir .arcflow/data

Tips#

Reset the graph: Delete worldcypher.snapshot.json and worldcypher.wal. Next query starts fresh.
Back up the graph: Copy worldcypher.snapshot.json. It's a self-contained JSON file.
Version control: Add .arcflow/config.yaml to git. Add .arcflow/data/ to .gitignore.
Inspect the data: cat .arcflow/data/worldcypher.snapshot.json | python3 -m json.tool
Export for other tools: Use :export graphml graph.xml for Gephi, yEd, or Cytoscape.

Filesystem Workspace

Quick Start: One Command#

Three persistence modes#

| Mode | How | When to use | |--- ---|-----|---#

How Persistence Works#

What gets created#

Startup sequence#

What workspace init Creates#

Persistence Modes in Detail#

| Command | Description | |--- ------|---#

Connecting Claude Code#

Method 1: CLI commands (fastest)#

Method 1+: Filesystem mount (the world model as files)#

Method 2: MCP server (cloud chat interfaces only)#

Method 3: HTTP API (remote access)#

Practical Example: Codebase World Model#

Step 1: Initialize#

Step 2: Build the graph#

Step 3: Query and analyze#

Step 4: Persist across sessions#

REPL Commands Reference#

| Command | Description | |--- ------|---#

Diagnostic Commands#

Multi-Agent Workspace#

Tips#

See Also#

Filesystem Workspace

Quick Start: One Command#

Three persistence modes#

| Mode | How | When to use | |--- ---|-----|---#

How Persistence Works#

What gets created#

Startup sequence#

What workspace init Creates#

Persistence Modes in Detail#

| Command | Description | |--- ------|---#

Connecting Claude Code#

Method 1: CLI commands (fastest)#

Method 1+: Filesystem mount (the world model as files)#

Method 2: MCP server (cloud chat interfaces only)#

Method 3: HTTP API (remote access)#

Practical Example: Codebase World Model#

Step 1: Initialize#

Step 2: Build the graph#

Step 3: Query and analyze#

Step 4: Persist across sessions#

REPL Commands Reference#

| Command | Description | |--- ------|---#

Diagnostic Commands#

Multi-Agent Workspace#

Tips#

See Also#

What `workspace init` Creates#

What `workspace init` Creates#