Tutorial: Entity Linking
One entity often appears across multiple datasets under different IDs. Entity linking is how the world model unifies them — a single node, confidence-scored SAME_AS edges, provenance back to every source.
The pattern#
Entity linking connects records that refer to the same real-world entity. In a graph, this means:
- Find entity A (from source 1)
- Find entity B (from source 2)
- Create a relationship between them
Basic entity linking#
import { openInMemory } from 'arcflow'
const db = openInMemory()
// Data from source 1
db.mutate("CREATE (p:Person {id: 'p1', name: 'Alice Chen', source: 'crm'})")
// Data from source 2
db.mutate("CREATE (p:Person {id: 'p2', name: 'A. Chen', source: 'linkedin'})")
// Link them (multi-MATCH pattern)
db.mutate(
"MATCH (a:Person {id: $id1}) MATCH (b:Person {id: $id2}) MERGE (a)-[:SAME_AS {confidence: $conf}]->(b)",
{ id1: 'p1', id2: 'p2', conf: 0.92 }
)Parameterized linking function#
function linkEntities(
db: ArcflowDB,
sourceLabel: string, sourceId: string,
targetLabel: string, targetId: string,
relType: string,
confidence: number
) {
db.mutate(
`MATCH (a:${sourceLabel} {id: $sid}) MATCH (b:${targetLabel} {id: $tid}) MERGE (a)-[:${relType} {confidence: $conf}]->(b)`,
{ sid: sourceId, tid: targetId, conf: confidence }
)
}
// Usage
linkEntities(db, 'Person', 'p1', 'Org', 'o1', 'WORKS_AT', 0.95)
linkEntities(db, 'Person', 'p1', 'Person', 'p2', 'KNOWS', 0.80)Fact-based linking (triple pattern)#
For richer semantics, create fact nodes that describe the relationship:
function projectFact(
db: ArcflowDB,
subjectId: string, subjectLabel: string,
objectId: string, objectLabel: string,
predicate: string, confidence: number, source: string
) {
const factId = `fact-${subjectId}-${predicate}-${objectId}`
db.batchMutate([
`MERGE (f:Fact {uuid: '${factId}', predicate: '${predicate}', confidence: ${confidence}, source: '${source}'})`,
`MATCH (s:${subjectLabel} {id: '${subjectId}'}) MATCH (f:Fact {uuid: '${factId}'}) MERGE (s)-[:SUBJECT_OF]->(f)`,
`MATCH (f:Fact {uuid: '${factId}'}) MATCH (o:${objectLabel} {id: '${objectId}'}) MERGE (f)-[:OBJECT_IS]->(o)`,
])
}
// Usage
projectFact(db, 'p1', 'Person', 'o1', 'Org', 'employment', 0.95, 'crm-export')Querying linked entities#
Find all links for an entity#
const links = db.query(
"MATCH (a:Person {id: $id})-[r]->(b) RETURN labels(b), b.name, b.id",
{ id: 'p1' }
)Traverse through facts#
const facts = db.query(`
MATCH (s:Person {id: $id})-[:SUBJECT_OF]->(f:Fact)-[:OBJECT_IS]->(o)
RETURN f.predicate, f.confidence, o.name, labels(o)
ORDER BY f.confidence DESC
`, { id: 'p1' })Find entities connected by high-confidence facts#
const highConf = db.query(`
MATCH (a)-[:SUBJECT_OF]->(f:Fact)-[:OBJECT_IS]->(b)
WHERE f.confidence > 0.9
RETURN a.name, f.predicate, b.name, f.confidence
`)Batch entity linking (pipeline pattern)#
For high-throughput pipelines processing hundreds of entities per batch:
function projectBatch(db: ArcflowDB, entities: EntityRecord[]) {
// Phase 1: Create all entity nodes
const entityMutations = entities.map(e =>
`MERGE (n:${e.label} {id: '${e.id}', name: '${e.name}', workspaceId: '${e.workspaceId}'})`
)
db.batchMutate(entityMutations)
// Phase 2: Create all relationships
const relMutations = entities
.filter(e => e.links)
.flatMap(e => e.links!.map(link =>
`MATCH (a:${e.label} {id: '${e.id}'}) MATCH (b:${link.targetLabel} {id: '${link.targetId}'}) MERGE (a)-[:${link.relType}]->(b)`
))
if (relMutations.length > 0) {
db.batchMutate(relMutations)
}
}See Also#
- Knowledge Graph Tutorial — build the world model these entities live in
- Use Case: Knowledge Management — entity extraction in production pipelines
- Skills — declarative rules that link entities automatically
- Confidence & Provenance — scoring links you infer
Try it
Open ↗⌘↵ to run
Loading engine…