Bias Audits and Fairness Metrics

Disparate impact, equal opportunity, calibration — and why they conflict.

0/1 done

No free lunch

A short menu of metrics

  • Demographic parity — same positive rate across groups.
  • Equal opportunity — same true-positive rate across groups.
  • Equalised odds — same TPR and FPR.
  • Calibration — same predicted score implies same actual probability across groups.

The impossibility result (Chouldechova / Kleinberg 2017)

If base rates differ between groups, you generally cannot satisfy calibration + equalised odds simultaneously. You must choose which fairness criterion fits the use case — and document the trade-off.

Operational practice

  1. Define protected attributes up front — gender, ethnicity, age band, geography.
  2. Slice evaluation by them in every CI run.
  3. Set thresholds (e.g. four-fifths rule: protected-group selection rate ≥ 80% of best group).
  4. If the threshold breaks, the build fails — same as any test.

Analogy

Fairness metrics are like medical screening criteria. Sensitivity vs specificity vs PPV — you can't max all three; the right trade-off depends on whether you're screening for a treatable cancer or a rare disease. ML fairness has the same shape: choose with intent.

Reflect

Stress-test your current model.

  • What protected attributes apply to your domain?
  • Which fairness criterion best matches the *harm* you want to avoid?
  • What would it take to add a fairness slice to your CI today?

Reading in progress · 0 of 1 activity done