Streaming Architectures

Kappa vs Lambda — when the streaming engine *is* the system of record.

0/2 done

Overview

Streaming Architectures

Kappa vs Lambda — when the streaming engine is the system of record.

Why it matters

Kappa = streaming-only, replay from log. Lambda = batch + streaming side-by-side. Modern stacks (Kafka + Flink, Materialize) push Kappa further than ever.

Going deeper

Deciding between Kappa and Lambda:

Lambda Architecture (Batch + Streaming parallel paths):

  • Pros: Safe. If the stream processor drops messages or loses state, the nightly batch corrects it.
  • Cons: You have to write the same business logic twice—once in Scala/Java for the stream, once in SQL/Spark for the batch.

Kappa Architecture (Streaming only, infinite retention):

  • Pros: Single codebase. To backfill or fix a bug, you simply point a new stream-consumer at the beginning of the Kafka topic and fast-forward it to the present.
  • Cons: Harder to manage infrastructure. Storing petabytes of history in Kafka (or tiered storage) is administratively heavy. Windowing and out-of-order events become very complex.

Analogy

Lambda vs Kappa is like how you get your daily news.

  • Lambda Architecture is getting a live Twitter feed during the day, AND a thick printed newspaper the next morning. You see things instantly on Twitter (Streaming), but the newspaper (Batch) provides the fully fact-checked, high-quality, normalized view of the whole day. You maintain two completely different systems.
  • Kappa Architecture is having an incredibly powerful, endlessly scrolling digital live-feed (Streaming) that also stores all historical articles perfectly. If you want yesterday's news, you just scroll back up the feed. There is no separate printed paper; the stream handles both the "now" and the "then" using exactly the same logic.

Make it stick

Use the prompts below to anchor streaming architectures to something you actually own.

  • Do you have identical business logic duplicated across a real-time event consumer and a nightly batch job? How often do they drift out of sync?
  • If your Kafka cluster had to replay all events from the past year, would it survive the load, or do you rely on a data warehouse for historical data?
  • What is a report in your company that executives think needs to be 'real-time', but actually fundamentally functions fine on a 'nightly batch'?

Reading in progress · 0 of 2 activities done