What's the single biggest design constraint in Cassandra modeling?

Model per query — secondary indexes are weak, so every access pattern needs its own table / partition

Your team wants to store IoT readings (1M devices, one reading every 10 s) and answer 'avg temperature per device per hour for the last 30 days'. Which store?

A time-series DB — native downsampling + retention + time-bucket aggregates do the heavy lifting for free

Key-Value, Wide-Column, Time-Series — Semantic Web Academy

Overview

Key-Value, Wide-Column, Time-Series

When 'just look up by key' beats SQL — and the cost of giving up joins.

Why it matters

KV stores (Redis, DynamoDB) win on raw read latency and write throughput. Wide-column (Cassandra) wins on linear scalability for known access patterns. Time-series (Influx, Timescale) wins on retention + downsampling out of the box.

Going deeper

Workload → store cheat-sheet:

Workload	Best fit	Why
Session cache, rate-limit counter	Key-Value (Redis)	Sub-ms by-key reads, TTL built in
'Show me Alice's last 50 messages'	Wide-column	Partition by `user_id`, cluster by `created_at DESC`, single seek
Server CPU metrics, IoT sensor stream	Time-series	Native downsampling, retention, gap-fill
Ad-hoc analytical join across 6 tables	Warehouse / OLAP	None of the three above will help — wrong tool

Cassandra-shaped antipattern: 'we need flexibility, let's add secondary indexes later.' Cassandra's secondary indexes scatter-gather across every node and degrade rapidly. Model per query means: list the queries first, then design one table per query, and accept the write-amplification.

Analogy

These three stores are purpose-built workshop tools, not Swiss Army knives.

Key-Value (Redis, DynamoDB) is the coat-check ticket: hand over a number, get your coat back in microseconds. Beautifully fast at one job, hopeless if you ask 'which coats are red?'.
Wide-column (Cassandra, ScyllaDB) is the seat number on a stadium ticket: sector + row + seat tells the usher exactly which partition + cluster column to walk to. Brilliant for known access patterns, painful when the question is 'find me all the people wearing hats'.
Time-series (InfluxDB, TimescaleDB, Prometheus) is the security-camera archive: append-only by timestamp, automatic downsampling of old footage, retention policies built in. Anything queried by 'what happened between T1 and T2?' belongs here.

The pricing is the same in all three: huge throughput / scale wins, paid for in giving up ad-hoc join freedom.

Make it stick

Use the prompts below to anchor key-value, wide-column, time-series to something you actually own.

›Where in your stack is a relational DB being used as a key-value cache? That hot path probably belongs in Redis.
›Pick a Cassandra (or DynamoDB) table you know. Which access pattern was bolted on *after* the schema, and what did it cost in writes?
›Which time-bucketed query in your warehouse would be 10× cheaper in a real time-series DB?

Reading in progress · 0 of 2 activities done