What is the strongest signal that you need streaming over batch?

A downstream consumer requires sub-minute latency

From Batch to Streaming — When and Why — Semantic Web Academy

Theory

Streaming is a latency choice, not a fashion choice

If the business can wait an hour, batch is cheaper, simpler and easier to debug. Streaming earns its complexity only when:

A downstream system needs sub-minute latency (fraud, personalisation, alerting).
A source is inherently streaming (clickstream, IoT, CDC) and you don't want to wait for the next batch window.
The data must be enriched on the fly before storage (stateful joins, deduplication).

This level layers on the Apache Kafka & Streaming track. If you haven't taken it, that's your prereq for broker internals (topics, partitions, consumer groups, offsets). Here we focus on the DE concerns: CDC, exactly-once, stream→table duality, Flink basics.

Analogy

Batch is the postal service: cheap, scheduled, reliable, and you don't ask for a parcel to arrive in 30 seconds. Streaming is the phone line: continuous, low-latency, but you pay for it always being on and the failure modes are subtler (dropped call > lost letter). Most platforms need both — fraud detection over the phone, monthly billing through the post.

The latency spectrum

LayoutLabelsClick a node to focus its neighbourhood · drag to pan · scroll to zoom

The latency/cost spectrum. You almost never jump straight from nightly batch to true streaming — micro-batch sits in the middle and captures most of the value at a fraction of the operational cost.

Reflect

The honest test: write down the latency SLA of each downstream consumer of a candidate stream. If none is under 15 minutes, micro-batch (Spark Structured Streaming every 5 minutes) gives you 90% of the value at 30% of the operational cost of true streaming.

›Which of your streams could be micro-batches without anyone noticing?
›Which batch jobs would consumers genuinely pay for in lower latency — and what's the budget you'd set?

Reading in progress · 0 of 2 activities done