Swiss German error rates spike right after a model release. What is the safest immediate action?

Flip to assisted mode, roll back to the previous stable release_id, then replay affected calls through the last stable config.

Increase auto-process volume to collect more data.

Flip to assisted mode, roll back to the previous stable release_id, then replay affected calls through the last stable config.

Delete the low-confidence transcripts.

Disable audit logging to reduce latency.

5 · Deployment and runbook — from shadow mode to safe auto-processing

A staged rollout, confidence-gated auto-processing, immutable audit trail, and a concrete incident runbook for Swiss German STT claims intake.

0/2 done

Theory — staged, reversible rollout

Roll out by earning trust, one mode at a time

Never go straight to auto-processing — you have no production accuracy evidence yet. Stage it so each phase is reversible:

Phase 1 (weeks 1–3) · Shadow mode. Pipeline runs, writes nothing to the CRM. You compare its output against agents' manual claims to build the real accuracy baseline on live traffic.
Phase 2 (weeks 4–7) · Assisted mode. The pipeline pre-fills fields; the agent confirms before submit. This is where the ontology proves it saves the 6–9 minutes, with a human still in the loop.
Phase 3 (weeks 8–12) · Gated auto-process. Only the highest-confidence flows auto-create (stt_confidence ≥ 0.88 and entity_coverage ≥ 0.80 and no forbidden-uncertainty terms). Everything else falls to assisted/review.

Assisted mode is also your permanent safe fallback: any incident degrades the system back to it rather than offline.

Theory — defensible operations

Compliance, audit, and the incident runbook

Swiss insurance is regulated; an auto-created claim must be defensible.

Immutable audit trail — log per call, never overwrite: raw audio URI, raw transcript, normalised transcript, extracted entities, routing decision, and the release_id of the ontology + model that produced them. With release_id on every call you can reconstruct exactly which lexicon version made any decision.

Versioned change control. Every lexicon/threshold/prompt change is a versioned release with notes. A 'we tweaked a regex on Tuesday' with no trace is an audit finding waiting to happen.

Rejection-reason taxonomy. Standardise why calls were routed to review (dialect_unclear, amount_ambiguous, background_noise) so QA and regulators get consistent reports — and so daily triage maps cleanly back to ontology rows.

Core incident runbook — trigger: Swiss-German mis-transcription spike after a release:

Contain — flip auto-process off; force assisted mode globally.
Roll back — revert to the previous stable model + ontology release_id.
Replay — re-run the last 24h of calls through the prior stable config and diff the claims for damage.
Report — publish an incident summary with affected claim count and remediation ETA.

Because rollout mode and release_id are runtime config, every step above is a switch, not a deploy.

Reflect

Operational safety is a product feature in regulated claims intake.

›Can you, right now, reconstruct which lexicon version produced a given claim six months ago? If not, add release_id to your log today.
›What spike threshold would trigger automatic rollback in your domain — and have you ever tested that the rollback actually works?
›Which single audit field, if missing, would make an auto-created claim indefensible to a regulator?

Reading in progress · 0 of 2 activities done