Theory — KPIs that reflect the business, not the transcript
Measure the right thing
The classic mistake is optimising Word Error Rate (WER) and calling it done. Claims operations don't care whether the transcript is beautiful — they care whether the extracted claim is correct and whether the routing decision was safe. So measure four KPIs from day one:
| KPI | Target | Why it's the one that matters |
|---|---|---|
| WER (Swiss German subset, post-normalisation) | ≤ 0.20 | Sanity check on the raw signal after the ontology has done its work |
| Entity F1 (policy_number, amount_chf) | ≥ 0.90 | The actual product: did we capture the claim correctly? |
| False auto-process rate | ≤ 2% | A bad auto-claim is a compliance event, not just an error |
| Review-queue SLA | 95% within 30 min | The pipeline's job is to shrink human load, measurably |
Note that entity F1 can be high even when WER is mediocre — that's the whole point of the ontology. The transcript can mishear filler words as long as it canonicalises the claim-critical tokens. Reporting WER alone would hide your real success.