Why is `mlflow.pyfunc.spark_udf` efficient for billion-row scoring?

It loads the model once per executor and scores partitions in parallel

It calls the REST endpoint per row

Batch & Streaming Inference at Scale — Semantic Web Academy

One model, three serving shapes

A registry model isn't only a REST endpoint. The same artifact drives three deployment shapes, picked by latency need:

Online (REST) — mlflow models serve, per-request, ms latency.
Batch (Spark) — score a billion rows nightly with a pandas/ Spark UDF; throughput over latency.
Streaming — load the pyfunc once per consumer and score each Kafka record as it arrives.

MLflow makes batch trivial with spark_udf:

import mlflow.pyfunc
predict = mlflow.pyfunc.spark_udf(spark, 'models:/credit-scoring/Production')
scored = df.withColumn('pred', predict(*df.columns))

The model is loaded once per executor and broadcast across the partitions — so a billion-row scoring job reuses the exact artifact your REST endpoint serves. Identical predictions, totally different scale envelope.

Batch & Streaming Inference at Scale

Pick the shape, reuse the artifact

One model, three serving shapes

Analogy

Reflect