Explainability in Production

SHAP, attention, counterfactuals — and the cost of computing them per request.

0/1 done

What, how much, for whom

Three families

  • Local feature attributions — SHAP, LIME, integrated gradients. Answer: which inputs drove this prediction?
  • Global — partial dependence, feature importance. Answer: how does the model use feature X on average?
  • Counterfactual — what minimal change to the inputs would flip the prediction? Best for user-facing explanations and recourse.

The production catch

SHAP per request can be slower than inference itself. Two pragmatic patterns:

  1. Pre-compute for high-value cohorts (top X% by impact).
  2. Async backfill — log every prediction, run SHAP in a batch job, attach when ready.

Required by GDPR Art. 22 / EU AI Act when the decision has 'significant effect' on the user — credit, hiring, insurance.

Analogy

Explainability per request is simultaneous translation at the UN: necessary for high-stakes conversations, absurdly expensive for casual chatter. You hire the translator for the negotiations that matter.

Reading in progress · 0 of 1 activity done