Overview
Evaluating GDS Outputs
How to validate algorithm outputs with modularity, stability checks, and business-grounded metrics.
Why it matters
A graph algorithm is useful only when its output improves a real decision and remains stable across releases.
Going deeper
A model-free algorithm still needs model-grade evaluation. For clustering (Louvain/Leiden), track:
- Modularity — internal cohesion vs random; the headline quality score, but high modularity on a meaningless partition is still meaningless.
- Stability — re-run with a different seed/order; do the top communities persist? Wildly different clusters each run means you're reading noise.
- Expert precision — sample communities and have a domain owner confirm they correspond to something real (a segment, a ring).
- Business lift — does using the output raise fraud hit-rate / CTR / resolution speed? The only metric that ultimately matters.
Write-back outputs should be versioned so a model change doesn't silently shift the features every downstream consumer relies on.