Overview
Pin your golden set, fail builds on quality drops.
Why it matters
Treat retrieval/generation quality like a unit test suite. Every prompt change, every model bump, every chunker tweak — all run against the same set.
Where this sits in the stack
Regression CI for RAG and agents is one of the load-bearing decisions in a KG/RAG/agent system: choices made here propagate to retrieval quality, agent reliability, cost per query, and the on-call burden of whoever ships it. Teams that name this trade-off explicitly ship faster than teams that leave it implicit.