Label Design

Picking labels that match query patterns — and the cost of over-labelling.

0/2 done

Overview

Label Design

Picking labels that match query patterns — and the cost of over-labelling.

Why it matters

Labels are the planner's primary handle: indexes are scoped to a label, and MATCH (n:Label) is the entry point for almost every read.

Going deeper

Three working rules for label design:

  1. One primary label per node, naming what the node is (:Person, :Company, :Order). This is the label your MATCH (n:Label) queries start from and the label your indexes are scoped to.
  2. Use additional labels sparingly, only for cross-cutting roles you query on directly (:Active, :VIP, :Deleted). A label that's never used as the entry of a MATCH is paying its planner cost for nothing.
  3. Prefer a property over a label for high-cardinality state(p:Person {status: 'archived'}) instead of :Person:Archived, unless you always query archived people separately and want a dedicated index.

Anti-pattern: turning every domain attribute into a label (:Person:Engineer::Senior:Remote:EU). You've reinvented dimension tables — badly.

Analogy

Labels are the coloured tabs on a filing cabinet.

One coloured tab per folder makes the cabinet easy to scan: 'all the red folders are HR'. Stick five tabs on every folder and the colour-code stops meaning anything — you're back to reading every label by hand. Worse, the new intern doesn't know which tab is the primary one and they file things wrong.

In Neo4j, labels behave the same way. Each label is an index scope and a planner hint. One or two labels per node give the planner a clean entry point. Five labels per node force it to weigh five candidate indexes for every query, and its picks become less predictable.

Make it stick

Use the prompts below to anchor label design to a real graph you own.

  • List the labels on your busiest node type. Which of them is *never* the entry point of a MATCH? That one is dead weight.
  • Where in your model has a property quietly become a label — or vice-versa — and what query pattern justified the choice?
  • Pick a recurring query. Could a different *primary* label make it index-friendly without restructuring the graph?

Reading in progress · 0 of 2 activities done