Theory
The original 3 V's + the one we should have started with
- Volume: how much (GB / TB / PB)?
- Velocity: how fast does new data arrive (batch / streaming / real-time)?
- Variety: how many shapes (structured / semi / unstructured / multi-modal)?
- Veracity (the one IBM added later, often the only one that matters): how much do you trust it?
Cynical truth: most enterprises drown in veracity problems long before they hit a real volume problem.
Why this taxonomy survives
The V's aren't a marketing trick — each one changes the architecture:
| V | Forces choice of… | Anti-pattern |
|---|---|---|
| Volume | Storage tier (object store vs warehouse), partitioning strategy | Single-node DB at TB scale |
| Velocity | Batch vs micro-batch vs streaming engine | Cron job firing every minute on a growing table |
| Variety | Schema strategy, integration layer, metadata model | One giant table to fit them all |
| Veracity | DQ programme, lineage, ownership, governance | Dashboards nobody trusts |