Which JSON block is the strongest lever for improving Swiss German handling *without* retraining the STT model?

concepts.*.surface_forms (the dialect lexicon) — plus feeding those labels in as STT hotwords

What is the most accurate reason to prefer a JSON ontology over OWL/RDF for this MVP?

The MVP's value is lexical normalisation + value constraints, which JSON does with far less operational cost; OWL's reasoning and global identity aren't needed yet.

JSON is more expressive than OWL.

The MVP's value is lexical normalisation + value constraints, which JSON does with far less operational cost; OWL's reasoning and global identity aren't needed yet.

OWL cannot represent insurance claims.

JSON supports automated inference and SHACL does not.

2 · The lightweight JSON ontology — build it, then justify JSON over OWL/RDF

Build the deployable JSON ontology + pipeline contract, see how each part corrects Swiss German, and understand exactly what JSON trades away vs OWL/RDF.

0/6 done

Theory — the ontology, block by block

Anatomy of the lightweight ontology

Open the artefact below. It is small on purpose, and every block maps to one of the four ontology jobs from Lesson 1:

Block	Ontology job	What it fixes in Swiss German
`concepts.*.canonical_values`	Name the concepts	Gives extraction a closed target set instead of free text
`concepts.*.surface_forms`	Synonyms / lexical layer	Collapses hagelschade / es het ghaglet → `hagelschaden`
`concepts.IncidentType.broader`	Taxonomy (SKOS-style)	Lets hagelschaden roll up to elementarschaden for reporting
`spoken_number_forms`	Datatype normalisation	Maps zwöitusig → `2000`
`PolicyNumber.pattern`	Value constraint	Rejects a misheard policy number outright
`constraints.Claim`	Shape / SHACL-lite	States what a valid claim must contain

This is genuinely an ontology — concepts, labels, synonyms, a broader hierarchy, and constraints — just serialised as JSON instead of Turtle. The MVP pipeline (second artefact) consumes this: it feeds surface_forms keys as STT hotwords, uses surface_forms as the normaliser dictionary, and enforces constraints before anything reaches the CRM.

Theory — JSON now, OWL/RDF later (and why)

JSON vs OWL/RDF — the honest trade-off

Compare the two artefacts below: the same three claim rules as JSON validation and as SHACL on a graph. They encode identical intent. So why start with JSON?

What JSON-first buys you (why it's right for the MVP):

Ship speed. No triplestore, no reasoner, no IRI scheme — a file in the repo.
Ownership. Claims-ops can edit surface_forms in a pull request; they would never hand-edit Turtle.
Determinism & auditability. A regex either matches or it doesn't; routing is reproducible from (scores, thresholds). Easy to defend to a regulator.
Cheap to change. The accuracy loop is 'add dialect rows weekly' — a data edit, not an engineering project.

What you give up (and when it starts to hurt):

No reasoning / inference. JSON can't infer hagelschaden ⊑ elementarschaden ⊑ naturEreignis and propagate it; you'd hand-maintain every rollup.
No cross-system identity. RDF gives every concept a global IRI so the same policy lines up across claims, fraud and workshop systems; JSON keys are local.
Constraints don't travel with the data. SHACL ships with the graph; JSON validation lives in one service and must be re-implemented elsewhere.

The rule of thumb: start where the value is (lexical normalisation + constraints = JSON), graduate to OWL/RDF only when reasoning or cross-system identity becomes the bottleneck (Lesson 4). Choosing OWL on day 1 is paying an architecture tax for capabilities the MVP doesn't use yet.

Reflect

Make the trade-off concrete for your own system.

›Which of the four ontology jobs (name / canonicalise / taxonomy / constrain) is delivering the most accuracy for you today?
›Write the one inference you'd want that JSON can't do — that sentence is your future case for moving to OWL.
›Who owns surface_forms in your org chart? If the answer is 'engineering', your dialect coverage will grow too slowly.

Reading in progress · 0 of 6 activities done