GraphRAG's golden rule: garbage indexed, garbage retrieved. Retrieval quality is capped by graph-build quality, and the highest-leverage build step is canonicalisation — making sure AI, Artificial Intelligence and A.I. resolve to one node, not three.
canonicalization:
person_alias_merge_rate: 0.87 # good aliases merged
org_alias_false_merge_rate: 0.04 # distinct orgs wrongly merged
gate:
fail_if_precision_below: 0.88
fail_if_false_merge_above: 0.06
False merges are the dangerous defect. A missed merge leaves a node slightly fragmented — annoying but recoverable. A false merge collapses two distinct identities (two different 'John Smith's, two different 'Apple's) into one node, and now every traversal produces confidently wrong multi-hop answers that are almost impossible to debug because the graph 'looks' clean. So watch false-merge rate even more closely than merge recall.
Validate domain/range. An edge Atlas-worksFor-P12 (a product 'working for' a policy) violates the schema's domain/range and should be dropped at build time, not discovered at answer time. Track invalid_domain_range_edges and dropped_edges as build metrics.
Make build quality a gate. Precision, false-merge rate and dropped-edge counts belong in the same CI gate as your retrieval metrics. A graph that silently degrades upstream will degrade every answer downstream.