Retrieval metrics — recall@k, MRR, nDCG

The classical IR metrics, applied to vector retrieval.

0/3 done

Overview

The classical IR metrics, applied to vector retrieval.

Why it matters

If you can't measure retrieval, you can't improve it. recall@k and MRR are the two metrics you'll cite in every retrieval debate.

Where this sits in the stack

Retrieval metrics — recall@k, MRR, nDCG is one of the load-bearing decisions in a KG/RAG/agent system: choices made here propagate to retrieval quality, agent reliability, cost per query, and the on-call burden of whoever ships it. Teams that name this trade-off explicitly ship faster than teams that leave it implicit.

Why this is load-bearing

Retrieval metrics — recall@k, MRR, nDCG is the building-code of this layer. You can ignore building codes on a shed, but the moment you put two storeys on top of the same foundation they decide whether the structure stands or falls. In a KG/RAG/agent stack, the equivalent of 'two storeys' is the second feature you ship on top of this primitive — GraphRAG on top of chunking, supervisor agents on top of state machines, regression CI on top of metrics. The cost of cutting the wrong corner now is paid by every later layer, with interest.

If you can't measure retrieval, you can't improve it. recall@k and MRR are the two metrics you'll cite in every retrieval debate.

Reflect — apply it

Anchor retrieval metrics — recall@k, mrr, ndcg to something concrete in your own work.

  • Where have you seen retrieval metrics — recall@k, mrr, ndcg done well? Name one team or product and what they got right.
  • Where have you seen it done badly? What was the first symptom that surfaced (latency, hallucination, cost, outage)?
  • What is the *cheapest* version of this you could ship in your next sprint, and what single metric would tell you it's working?

Reading in progress · 0 of 3 activities done