One choice, many consequences
Three serving modes
| Mode | Latency | Throughput | Typical SLO | Stack examples |
|---|---|---|---|---|
| Batch | minutes-hours | enormous | overnight | Airflow + Spark + parquet output |
| Online | 10-200 ms | thousands req/s | p95 < 200 ms | KServe, BentoML, SageMaker, Triton |
| Streaming | sub-second on event | per-event | per-event latency | Kafka + Flink + model UDF |
Decision heuristic
- Is the consumer a dashboard refreshed daily? Batch.
- Is it a user clicking a button? Online.
- Is it a transaction stream where decisions must precede the next event? Streaming.
Many real systems run two modes in parallel: batch for historical scoring + online for fresh requests.