Overview
Document & Aggregate Modeling
Embed vs reference — the trade-off between read speed and update fan-out.
Why it matters
Document stores reward aggregate-oriented design: model the unit that's loaded together. Embed when reads are frequent and updates rare; reference when the embedded thing changes independently.
Going deeper
A practical decision matrix for embed vs reference:
| Question | Embed | Reference |
|---|---|---|
| Always loaded together? | ✅ | ❌ |
| Bounded size (fits in one doc, < 16 MB Mongo limit)? | ✅ | — |
| Mutated independently of the parent? | ❌ | ✅ |
| Referenced from many parents (many-to-many)? | ❌ | ✅ |
| Needed in isolation by other queries? | ❌ | ✅ |
Common mistake: embedding unbounded arrays (comments on a post, events on a user). Each new write rewrites the whole doc and eventually busts the size cap. Rule of thumb: if the child collection has no natural upper bound, reference it.