No free lunch
A short menu of metrics
- Demographic parity — same positive rate across groups.
- Equal opportunity — same true-positive rate across groups.
- Equalised odds — same TPR and FPR.
- Calibration — same predicted score implies same actual probability across groups.
The impossibility result (Chouldechova / Kleinberg 2017)
If base rates differ between groups, you generally cannot satisfy calibration + equalised odds simultaneously. You must choose which fairness criterion fits the use case — and document the trade-off.
Operational practice
- Define protected attributes up front — gender, ethnicity, age band, geography.
- Slice evaluation by them in every CI run.
- Set thresholds (e.g. four-fifths rule: protected-group selection rate ≥ 80% of best group).
- If the threshold breaks, the build fails — same as any test.