Overview
Graph Data Quality and Drift Control
Constraints, duplicate detection, and drift monitors that keep a graph trustworthy under continuous writes.
Why it matters
A fast graph with bad identity hygiene creates expensive downstream errors: duplicate entities, broken traversals, and misleading analytics.
Going deeper
Quality is enforced at write time, not audited after the fact:
// Identity hygiene: one canonical node per business key.
CREATE CONSTRAINT customer_id IF NOT EXISTS
FOR (c:Customer) REQUIRE c.externalId IS UNIQUE;
// Ingestion uses MERGE on the canonical key, never CREATE.
MERGE (c:Customer {externalId:$id}) SET c.name = $name, c.updatedAt = datetime();
A minimal quality contract has four parts: uniqueness constraints on external IDs, shape checks for critical properties (no null email on an active customer), duplicate/orphan monitors, and a drift review with the domain owner. Each has an owner and an alert — an un-owned check is decoration.
The reason this is non-negotiable: graph algorithms propagate. A duplicate identity doesn't stay local — it splits a community, distorts a centrality score, and corrupts every traversal that passes through it.