Pick the shape, reuse the artifact
One model, three serving shapes
A registry model isn't only a REST endpoint. The same artifact drives three deployment shapes, picked by latency need:
- Online (REST) —
mlflow models serve, per-request, ms latency. - Batch (Spark) — score a billion rows nightly with a pandas/ Spark UDF; throughput over latency.
- Streaming — load the
pyfunconce per consumer and score each Kafka record as it arrives.
MLflow makes batch trivial with spark_udf:
import mlflow.pyfunc
predict = mlflow.pyfunc.spark_udf(spark, 'models:/credit-scoring/Production')
scored = df.withColumn('pred', predict(*df.columns))
The model is loaded once per executor and broadcast across the partitions — so a billion-row scoring job reuses the exact artifact your REST endpoint serves. Identical predictions, totally different scale envelope.