Query planning is the air-traffic-control layer that runs before retrieval. Crawling the whole graph for every prompt is latency suicide; planning decides the intent, the seed entities, the hop budget and the retrieval blend up front.
query_plan:
question: 'Which exceptions were approved by leaders connected to Acme?'
classify: { intent: path+policy, requires_hops: 2 }
retrievers:
- { kind: vector, top_k: 8 }
- { kind: graph_traversal, start_entities: [Acme, 'policy exception'], max_hops: 2 }
re_rank: { features: [path_support, text_relevance, recency] }
Lock the hop budget before execution. requires_hops and max_hops are the variables that most control latency and noise. Setting them per-query (from the classified intent) instead of using one global default is what keeps p95 latency stable across easy and hard questions.
Plan the failure path too. Seed-entity extraction can fail — the question names an entity that isn't in the graph. A robust plan has a fallback (drop to pure vector search, or ask a clarifying question) instead of returning an empty traversal. Planning is not just the happy path; it's deciding what to do when the graph can't help.
Done well, planning turns GraphRAG from 'traverse everything and hope' into 'retrieve the minimal evidence set this specific question needs'.