Batch, Online, Streaming: Pick Your Mode

The first architectural choice — and it usually decides the SLO.

0/1 done

One choice, many consequences

Three serving modes

ModeLatencyThroughputTypical SLOStack examples
Batchminutes-hoursenormousovernightAirflow + Spark + parquet output
Online10-200 msthousands req/sp95 < 200 msKServe, BentoML, SageMaker, Triton
Streamingsub-second on eventper-eventper-event latencyKafka + Flink + model UDF

Decision heuristic

  • Is the consumer a dashboard refreshed daily? Batch.
  • Is it a user clicking a button? Online.
  • Is it a transaction stream where decisions must precede the next event? Streaming.

Many real systems run two modes in parallel: batch for historical scoring + online for fresh requests.

Analogy

Three kitchens: bakery (batch — one big run overnight), restaurant (online — diner orders, ten minutes), food truck at a marathon (streaming — runners pass one by one, each gets a cup in <1 s). Same skill set, completely different operational realities.

Reading in progress · 0 of 1 activity done