Theory
Streaming is a latency choice, not a fashion choice
If the business can wait an hour, batch is cheaper, simpler and easier to debug. Streaming earns its complexity only when:
- A downstream system needs sub-minute latency (fraud, personalisation, alerting).
- A source is inherently streaming (clickstream, IoT, CDC) and you don't want to wait for the next batch window.
- The data must be enriched on the fly before storage (stateful joins, deduplication).
This level layers on the Apache Kafka & Streaming track. If you haven't taken it, that's your prereq for broker internals (topics, partitions, consumer groups, offsets). Here we focus on the DE concerns: CDC, exactly-once, stream→table duality, Flink basics.